[ 
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633980#action_12633980
 ] 

Hong Tang commented on HADOOP-3315:
-----------------------------------

bq. On "indexing the region by the endKey", pardon me, I'm not sure I follow. 
Currently index is block-based, not key-based IIUC so can I even make an index 
that has all keys? Or, can you make an index that is key-based? (Even if I 
could index all keys, if key/values are small, might make for a big index so 
might need something like the MapFile interval).

Sorry for the confusion. My comment is in response to your question wrt whether 
TFile can support something similar to MapFile's getClosest() call. The answer 
to that question is that we cannot implement such semantics efficiently because 
the API would require a bidirectional iterator and the underlying decompression 
stream is not so.

My understanding of your usage case is that you currently have a MapFile with 
key being <region startKey> value may contain <region endKey, ...>. Given a 
client key, you perform getCloest(before==true) to get to the right region 
entry in the MapFile. To support the usage case in TFile, you may use <region 
endKey> as TFile key, and <region startKey, ...> as the value of TFile. Then 
TFile.Reader.ceiling(clientKey) will get you to the right entry.

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, 
> HADOOP-3315_20080915_TFILE.patch, TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs 
> to compress or decompress. It would be good to have a file format that only 
> needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to