[ 
http://issues.apache.org/jira/browse/HADOOP-106?page=comments#action_12371878 ] 

eric baldeschwieler commented on HADOOP-106:
--------------------------------------------

My intuition is it makes more sense to do this the other way around and have 
records aligned to blocks.  This keeps the FS implementation trivial.  Just pad 
near the end of a block.  This way you keep a good seperation of APIs too.  
Fairly straight forward to change the record model to do that.  Only issues are 
with huge records.  You have a couple of options there.  The simplest is to 
disallow them...

> Data blocks should be record-oriented.
> --------------------------------------
>
>          Key: HADOOP-106
>          URL: http://issues.apache.org/jira/browse/HADOOP-106
>      Project: Hadoop
>         Type: Wish
>   Components: dfs
>     Versions: 0.2
>     Reporter: Andrzej Bialecki 

>
> If data blocks were starting and ending on data record boundaries, and not in 
> random places within a file, it would give some important advantages:
> * it would be possible to avoid "fishing" for the beginning of first record 
> in a split (see SequenceFile.Reader.sync()).
> * it would make recovering from DFS errors much more successful and easier - 
> in most cases missing blocks could be just skipped and the remaining parts 
> combined together.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to