[ http://issues.apache.org/jira/browse/HADOOP-603?page=comments#action_12442490 ] Jim Kellerman commented on HADOOP-603: --------------------------------------
I believe that Google still supports append and re-writes the index at the end of the file. The advantage of this approach is that when you access these files it takes 1/2 the number of opens that MapFile does. This would significantly reduce the load on the name node. > Extend SequenceFile to provide MapFile function by storing index at the end > of the file > --------------------------------------------------------------------------------------- > > Key: HADOOP-603 > URL: http://issues.apache.org/jira/browse/HADOOP-603 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: Jim Kellerman > > MapFile increases the load on the name node as two files are created to > provide a index file format. If SequenceFile were extended by storing the > index at the end of the file, 1/2 of the files currently created for a > map/reduce operation would be needed, reducing the load on the name node. > Perhaps this is why Google implemented SSTable files in this manner. (SSTable > files are functionally identical to Hadoop MapFiles) (see the paper on > BigTable - section 4 "Building Blocks" > http://labs.google.com/papers/bigtable.html) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
