[ 
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596142#action_12596142
 ] 

Doug Cutting commented on HADOOP-3315:
--------------------------------------

Latest draft looks much better!
 - Does RO stand for something, or is it short for "row"?
 - The RO entry values can be more compactly represented as differences from 
the prior entry.  Is this intended?  If so, we should state this.
 - In data blocks, we might use something like 
<entryLength><keyLength><key><value>.  This would permit one to skip entire 
entries more quickly.  The valueLength can be computed as 
entryLength-keyLength.  Do folks think this is worthwhile?

> We should not depend on keys/values being Writables in TFile.

Good point.  So the writer's constructor should have Serlializer<K> and 
Serializer <V> parameters, and the reader Deserializer<K> and Deserializer<V> 
parameters.  This will permit us to, e.g., store Thrift or other objects in a 
TFile.

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Srikanth Kakani
>         Attachments: Tfile-1.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs 
> to compress or decompress. It would be good to have a file format that only 
> needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to