[ 
http://issues.apache.org/jira/browse/HADOOP-224?page=comments#action_12412094 ] 

Runping Qi commented on HADOOP-224:
-----------------------------------


Is this issue the same as the general versioning problem of object 
deserialization? Or did I miss something?

In my own programs, when I need to write a serializable class, I've been using  
a convention like the following:

        public void write(DataOutput out) throws IOException {
                out.writeInt(Link.version);
                out.writeUTF(this.url);
        }

        private void readFields_1(DataInput in) throws IOException {
                this.url = in.readUTF();
                ...
        }

        public void readFields(DataInput in) throws IOException {
                int version = in.readInt();
                switch (version) {
                case 1:
                        this.readFields_1(in);
                        break;
                default:
                        throw new IOException("Serialization version number " + 
version + " of class Link is not recognized\n");
                }
        }

When I make changes on the class representation that affect how the class is 
serialized, I'd implement a new read methods:

        public void write(DataOutput out) throws IOException {
                out.writeInt(Link.version);
                out.writeUTF(this.url);
                out.writeUTF(this.anchor);
        }

        private void readFields_2(DataInput in) throws IOException {
                this.url = in.readUTF();
                this.anchor = in.readUTF();
                                                     ....
        }

        public void readFields(DataInput in) throws IOException {
                int version = in.readInt();
                switch (version) {
                case 1:
                        this.readFields_1(in);
                        break;
                                                     case 2:
                                                                                
this.readFields_2(in);
                                                                                
break;
                default:
                        throw new IOException("Serialization version number " + 
version + " of class Link is not recognized\n");
                }
        }


I found this approach provides me great flexibility in versioning while 
maintaining backward compatibility.
And the code is also not hard to maintain.




> Allow simplified versioning for namenode and datanode metadata.
> ---------------------------------------------------------------
>
>          Key: HADOOP-224
>          URL: http://issues.apache.org/jira/browse/HADOOP-224
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>  Environment: All
>     Reporter: Milind Bhandarkar

>
> Currently namenode has two types of metadata: The FSImage, and FSEdits. 
> FSImage contains information abut Inodes, and FSEdits contains a list of 
> operations that were not saved to FSImage. Datanode currently does not have 
> any metadata, but would have it some day. 
> The file formats used for storing these metadata will evolve over time. It is 
> important for the file-system to be backward compatible. That is, the 
> metadata readers need to be able to identify which version of the file-format 
> we are using, and need to be able to read information therein. As we add 
> information to these metadata, the complexity of the reader increases 
> dramatically.
> I propose a versioning scheme with a major and minor version number, where a 
> different reader class is associated with a major number, and that class 
> interprets the minor number internally. The readers essentially form a chain 
> starting with the latest version. Each version-reader looks at the file and 
> if it does not recognize the version number, passes it to the version reader 
> next to it by calling the parse method, returnng the results of the parse 
> method up the chain (In case of the namenode, the parse result is an array of 
> Inodes.
> This scheme has an advantage that every time a new major version is added, 
> the new reader only needs to know about the reader for its immediately 
> previous version, and every reader needs to know only about which major 
> version numbers it can read.
> The writer is not so versioned, because metadata is always written in the 
> most current version format.
> One more change that is needed for simplified versioning is that the 
> "struct-surping" of dfs.Block needs to be removed. Block's contents will 
> change in later versions, and older versions should still be able to 
> readFields properly. This is more general than Block of course, and in 
> general only basic datatypes should be used as Writables in DFS metadata.
> For edits, the reader should return <opcode, ArrayWritable> pairs' array. 
> This will also remove the limitation of two operands for very opcodes, and 
> will be more extensible.
> Even with this new versioning scheme, the last Reader in the reader-chain 
> would recognize current format, thus maintaining full backward compatibility.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to