[ http://issues.apache.org/jira/browse/HADOOP-224?page=comments#action_12412079 ]
Doug Cutting commented on HADOOP-224: ------------------------------------- What if we were to, e.g., add an accessTime, how could we have it default to lastModified? Or if we remove a field, merging its value into another field, then we'd need fixup code specific to that. I'm not sure how you'd handle that. But then, I'm not sure I completely understand yet what you're proposing. Maybe some pseudo code would help? > Allow simplified versioning for namenode and datanode metadata. > --------------------------------------------------------------- > > Key: HADOOP-224 > URL: http://issues.apache.org/jira/browse/HADOOP-224 > Project: Hadoop > Type: Improvement > Components: dfs > Environment: All > Reporter: Milind Bhandarkar > > Currently namenode has two types of metadata: The FSImage, and FSEdits. > FSImage contains information abut Inodes, and FSEdits contains a list of > operations that were not saved to FSImage. Datanode currently does not have > any metadata, but would have it some day. > The file formats used for storing these metadata will evolve over time. It is > important for the file-system to be backward compatible. That is, the > metadata readers need to be able to identify which version of the file-format > we are using, and need to be able to read information therein. As we add > information to these metadata, the complexity of the reader increases > dramatically. > I propose a versioning scheme with a major and minor version number, where a > different reader class is associated with a major number, and that class > interprets the minor number internally. The readers essentially form a chain > starting with the latest version. Each version-reader looks at the file and > if it does not recognize the version number, passes it to the version reader > next to it by calling the parse method, returnng the results of the parse > method up the chain (In case of the namenode, the parse result is an array of > Inodes. > This scheme has an advantage that every time a new major version is added, > the new reader only needs to know about the reader for its immediately > previous version, and every reader needs to know only about which major > version numbers it can read. > The writer is not so versioned, because metadata is always written in the > most current version format. > One more change that is needed for simplified versioning is that the > "struct-surping" of dfs.Block needs to be removed. Block's contents will > change in later versions, and older versions should still be able to > readFields properly. This is more general than Block of course, and in > general only basic datatypes should be used as Writables in DFS metadata. > For edits, the reader should return <opcode, ArrayWritable> pairs' array. > This will also remove the limitation of two operands for very opcodes, and > will be more extensible. > Even with this new versioning scheme, the last Reader in the reader-chain > would recognize current format, thus maintaining full backward compatibility. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
