[
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633906#action_12633906
]
stack commented on HADOOP-3315:
-------------------------------
Other comments on TFile.java:
+ How about defines for at least the common compression types and comparator
name(s) at least for the common case where TFile is used by java?
+ If no compression, does that mean there is only one block in a file or do we
still make blocks of size minBlockSize (raw size == compressed size)?
+ If I wanted to ornament the index -- say, I wanted to add a metadata block
per BCFile block that had in it the offset of every added key (or the offset of
every 'row' in hbase) in the name of improving random access speeds -- it looks
like I would override prepareAppendKey and then do my own KeyRegister class
that keeps up the per-block index? KeyRegister is currently private. Can it
be made subclassable? advanceCursorInBlock is also private which doesn't help
if I want to exploit my ancillary-index info. Or what would you suggest if I
want to make a more-involved index (I can't use the BCFile block index since
key/values might be of variable size -- or, maybe I can set the blocksize to
zero and index every element?).
+ The code looks really good.
To add support for alternate comparators and for exposing the index at least to
subclasses, should we add a patch atop your patch or just wait till whats here
gets committed?
It looks like I could do an in-memory TFile if I wanted since I provide the
stream? Is that so? If so, thats sweet!
> New binary file format
> ----------------------
>
> Key: HADOOP-3315
> URL: https://issues.apache.org/jira/browse/HADOOP-3315
> Project: Hadoop Core
> Issue Type: New Feature
> Components: io
> Reporter: Owen O'Malley
> Assignee: Amir Youssefi
> Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch,
> HADOOP-3315_20080915_TFILE.patch, TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs
> to compress or decompress. It would be good to have a file format that only
> needs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.