[ 
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602114#action_12602114
 ] 

Srikanth Kakani commented on HADOOP-3315:
-----------------------------------------

Owen,

There would be one complication in exposing the append(long keyLength, long 
valueLength) that we did not discuss earlier. Although it can be handled.

If it the key,value is at the beginning of a block we need to copy to a byte 
array in the key.serialize(outputstream). We can do this by having a 
keyValueOutputStream(keybytes,valuebytes, outputstream), that captures the 
first keybytes of data written into a buffer. This needs to be done to generate 
an index. But it starts getting ugly.

I would also suggest ObjectFile should be extending the TFile and it can do all 
this in a neater fashion without exposing the append(keyLength, valueLength).

Additionally to make any of this feasible (You mentioned this earlier, I just 
want to record it), serializers should also have getSerializedLength().

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Srikanth Kakani
>         Attachments: Tfile-1.pdf, TFile-2.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs 
> to compress or decompress. It would be good to have a file format that only 
> needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to