Data blocks should be record-oriented.
--------------------------------------
Key: HADOOP-106
URL: http://issues.apache.org/jira/browse/HADOOP-106
Project: Hadoop
Type: Wish
Components: dfs
Versions: 0.2
Reporter: Andrzej Bialecki
If data blocks were starting and ending on data record boundaries, and not in
random places within a file, it would give some important advantages:
* it would be possible to avoid "fishing" for the beginning of first record in
a split (see SequenceFile.Reader.sync()).
* it would make recovering from DFS errors much more successful and easier - in
most cases missing blocks could be just skipped and the remaining parts
combined together.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira