Data blocks should be record-oriented.
--------------------------------------

         Key: HADOOP-106
         URL: http://issues.apache.org/jira/browse/HADOOP-106
     Project: Hadoop
        Type: Wish
  Components: dfs  
    Versions: 0.2    
    Reporter: Andrzej Bialecki 


If data blocks were starting and ending on data record boundaries, and not in 
random places within a file, it would give some important advantages:

* it would be possible to avoid "fishing" for the beginning of first record in 
a split (see SequenceFile.Reader.sync()).

* it would make recovering from DFS errors much more successful and easier - in 
most cases missing blocks could be just skipped and the remaining parts 
combined together.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to