[ http://issues.apache.org/jira/browse/HADOOP-224?page=all ]
Milind Bhandarkar updated HADOOP-224:
-------------------------------------
Attachment: hadoop-224.patch
Eric,
I agree. I have attached a patch that simplifies the mainline code, while
keeping all the logic for backward compatibility in a different class. The
advantage with this is that when you change a format significantly, you just
have to write a reader for that, without modifying any earlier versions.
> Allow simplified versioning for namenode and datanode metadata.
> ---------------------------------------------------------------
>
> Key: HADOOP-224
> URL: http://issues.apache.org/jira/browse/HADOOP-224
> Project: Hadoop
> Type: Improvement
> Components: dfs
> Environment: All
> Reporter: Milind Bhandarkar
> Attachments: hadoop-224.patch
>
> Currently namenode has two types of metadata: The FSImage, and FSEdits.
> FSImage contains information abut Inodes, and FSEdits contains a list of
> operations that were not saved to FSImage. Datanode currently does not have
> any metadata, but would have it some day.
> The file formats used for storing these metadata will evolve over time. It is
> important for the file-system to be backward compatible. That is, the
> metadata readers need to be able to identify which version of the file-format
> we are using, and need to be able to read information therein. As we add
> information to these metadata, the complexity of the reader increases
> dramatically.
> I propose a versioning scheme with a major and minor version number, where a
> different reader class is associated with a major number, and that class
> interprets the minor number internally. The readers essentially form a chain
> starting with the latest version. Each version-reader looks at the file and
> if it does not recognize the version number, passes it to the version reader
> next to it by calling the parse method, returnng the results of the parse
> method up the chain (In case of the namenode, the parse result is an array of
> Inodes.
> This scheme has an advantage that every time a new major version is added,
> the new reader only needs to know about the reader for its immediately
> previous version, and every reader needs to know only about which major
> version numbers it can read.
> The writer is not so versioned, because metadata is always written in the
> most current version format.
> One more change that is needed for simplified versioning is that the
> "struct-surping" of dfs.Block needs to be removed. Block's contents will
> change in later versions, and older versions should still be able to
> readFields properly. This is more general than Block of course, and in
> general only basic datatypes should be used as Writables in DFS metadata.
> For edits, the reader should return <opcode, ArrayWritable> pairs' array.
> This will also remove the limitation of two operands for very opcodes, and
> will be more extensible.
> Even with this new versioning scheme, the last Reader in the reader-chain
> would recognize current format, thus maintaining full backward compatibility.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira