[ 
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633906#action_12633906
 ] 

stack commented on HADOOP-3315:
-------------------------------

Other comments on TFile.java:

+ How about defines for at least the common compression types and comparator 
name(s) at least for the common case where TFile is used by java?
+ If no compression, does that mean there is only one block in a file or do we 
still make blocks of size minBlockSize (raw size == compressed size)?
+ If I wanted to ornament the index -- say, I wanted to add a metadata block 
per BCFile block that had in it the offset of every added key (or the offset of 
every 'row' in hbase) in the name of improving random access speeds -- it looks 
like I would override prepareAppendKey and then do my own KeyRegister class 
that keeps up the per-block index?  KeyRegister is currently private.  Can it 
be made subclassable?  advanceCursorInBlock is also private which doesn't help 
if I want to exploit my ancillary-index info.  Or what would you suggest if I 
want to make a more-involved index (I can't use the BCFile block index since 
key/values might be of variable size -- or, maybe I can set the blocksize to 
zero and index every element?).
+ The code looks really good.

To add support for alternate comparators and for exposing the index at least to 
subclasses, should we add a patch atop your patch or just wait till whats here 
gets committed?

It looks like I could do an in-memory TFile if I wanted since I provide the 
stream?  Is that so?  If so, thats sweet!

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, 
> HADOOP-3315_20080915_TFILE.patch, TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs 
> to compress or decompress. It would be good to have a file format that only 
> needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to