[ http://issues.apache.org/jira/browse/HADOOP-224?page=comments#action_12413158 ]
Konstantin Shvachko commented on HADOOP-224: -------------------------------------------- > I care much more about keeping the mainline simple then about infinite > backwards compatibility. > What if we just have a completely separate utility that is invoked if we hit > an old file that handles the whole thing? Yes, this is what my approach does. Keeping it simple it works in 2 steps: - Modify a serializable class as you wish (some restrictions like simple field types are applied) - Invoke VersionFactory to handle the whole versioning/serialization thing if an old file is hit. All the complexity is about how that factory works. Miland's patch, although it simplifies the existing code a lot, has specific serialization methods or classes (generally) for each version. I do not, the runtime class does not have knowledge of previous versions, it's the factory that does. > Allow simplified versioning for namenode and datanode metadata. > --------------------------------------------------------------- > > Key: HADOOP-224 > URL: http://issues.apache.org/jira/browse/HADOOP-224 > Project: Hadoop > Type: Improvement > Components: dfs > Environment: All > Reporter: Milind Bhandarkar > Attachments: hadoop-224.patch > > Currently namenode has two types of metadata: The FSImage, and FSEdits. > FSImage contains information abut Inodes, and FSEdits contains a list of > operations that were not saved to FSImage. Datanode currently does not have > any metadata, but would have it some day. > The file formats used for storing these metadata will evolve over time. It is > important for the file-system to be backward compatible. That is, the > metadata readers need to be able to identify which version of the file-format > we are using, and need to be able to read information therein. As we add > information to these metadata, the complexity of the reader increases > dramatically. > I propose a versioning scheme with a major and minor version number, where a > different reader class is associated with a major number, and that class > interprets the minor number internally. The readers essentially form a chain > starting with the latest version. Each version-reader looks at the file and > if it does not recognize the version number, passes it to the version reader > next to it by calling the parse method, returnng the results of the parse > method up the chain (In case of the namenode, the parse result is an array of > Inodes. > This scheme has an advantage that every time a new major version is added, > the new reader only needs to know about the reader for its immediately > previous version, and every reader needs to know only about which major > version numbers it can read. > The writer is not so versioned, because metadata is always written in the > most current version format. > One more change that is needed for simplified versioning is that the > "struct-surping" of dfs.Block needs to be removed. Block's contents will > change in later versions, and older versions should still be able to > readFields properly. This is more general than Block of course, and in > general only basic datatypes should be used as Writables in DFS metadata. > For edits, the reader should return <opcode, ArrayWritable> pairs' array. > This will also remove the limitation of two operands for very opcodes, and > will be more extensible. > Even with this new versioning scheme, the last Reader in the reader-chain > would recognize current format, thus maintaining full backward compatibility. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
