[jira] Commented: (HADOOP-3315) New binary file format

stack (JIRA) Wed, 28 Jan 2009 17:25:22 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668271#action_12668271
 ]


stack commented on HADOOP-3315:
-------------------------------

I've been playing around w/ the last patch trying to actually use it.

Here are a couple of (minor) comments:

Should probably be explicit about encoding in the below:
    public ByteArray(String str) {
      this(str.getBytes());

Would be nice if we could easily pass a alternate implementation of BCFile, say 
one that cached blocks.

Do you want to fix the below:

{code}
        // TODO: remember the longest key in a TFile, and use it to replace
        // MAX_KEY_SIZE.
        keyBuffer = new byte[MAX_KEY_SIZE];
{code}

Default buffers of 64k for keys is a bit on the extravagant side.

Below should be public so users don't have to define their own:

protected final static String JCLASS = "jclass:";

API seems to have changed since last patch.  There nolonger a #find method.  
Whats the suggested way of accessing a random single key/value? (Open scanner 
using what would you suggest for start and end?  Then seekTo?  But I find I'm 
making double ByteArray instances of same byte array.  Should there be a seekTo 
that takes a RawComparable that is public?).

Thanks Hong.



> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, 
> HADOOP-3315_20080915_TFILE.patch, hadoop-trunk-tfile.patch, 
> hadoop-trunk-tfile.patch, TFile Specification 20081217.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs 
> to compress or decompress. It would be good to have a file format that only 
> needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3315) New binary file format

Reply via email to