[ 
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633924#action_12633924
 ] 

Hong Tang commented on HADOOP-3315:
-----------------------------------

bq. hbase depends on the MapFile#getClosest (and MapFile#getClosestBefore): 
i.e. "Finds the record that is the closest match to the specified key." 
returning either an exact match as TFile#locate does or the next key (before or 
after dependent on provided parameters). TFile does not expose its index so 
this facility would be hard to build on TFile even in a subclass.

The reason we did not support this is because TFile.Reader.Scanner is 
intrinsically a forward iterator (borrowing STL C++ concept, that means, it 
only supports ++, not --). So to support the operation of looking for the key 
that is the largest key smaller or equal to the search key, you may need to 
decomress a block twice. We can discuss in detail of your usage case and see if 
there may be some work-around of it.

Note that this is also an issue with the currently implementation of scanner, 
which requires you to seek twice in the block (first in TFile.Reader.locate(), 
and then in the constructor of Scanner). This is not an issue in using TFile 
with MapReduce, because locate() and the construction of Scanner are in 
different processes (one in JobClient, one in Mapper). But for direct random 
access to TFile, we need to avoid such a problem (by providing a separate 
lookup call that returns the key and value stream instead of Location object).

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, 
> HADOOP-3315_20080915_TFILE.patch, TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs 
> to compress or decompress. It would be good to have a file format that only 
> needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to