[ 
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634285#action_12634285
 ] 

Hong Tang commented on HADOOP-3315:
-----------------------------------

bq. Sorry if this need seems exotic but I think we can get away with casting 
this need under the 'Extensibility' TFile Design Principal. In our application, 
keys are row/column/timestamp. If millions of columns in a row and we want to 
skip to the next row, we can't next-next-next through the keys. It'll be too 
slow. We need to skip ahead to the new row. Block index won't help in this 
regard.

Yes, it sounds reasonable to change the various indices in TFile as protected 
instead of private.

Just curiously, would your auxiilary index remember how many records start with 
the same row-key? So that you may want to take advantage of this to quickly 
advance? If true, a better way than opening on advanceCursoInBlock() is to 
provide an advanceCursor(n) API on the Scanner. 

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, 
> HADOOP-3315_20080915_TFILE.patch, TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs 
> to compress or decompress. It would be good to have a file format that only 
> needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to