[jira] Commented: (HADOOP-3315) New binary file format

Hong Tang (JIRA) Tue, 23 Sep 2008 14:50:36 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633920#action_12633920
 ]


Hong Tang commented on HADOOP-3315:
-----------------------------------

bq. We would like to leverage TFile in hbase. memcmp-sort of keys won't work 
for us. Will it be hard to add support for another comparator?
Yes, we plan to add this feature later - at least for Java. But we may restrict 
TFiles with such kind of comparators being accessed by Java only.

bq. On page 8., the awkward-looking for-loop at the head of the page with its 
isCursorAtEnd and advanceCursor has some justification in practicality, I 
presume. otherwise why not use the hasNext/next Iterator common (java) idiom?
Yes, the original consideration behind it is because the Java Iterator 
interface always park the cursor on the entry that is already read, and you use 
next() to move the cursor to the next and fetch the result atomically. On the 
other hand, TFile scanner separates the cursor movement and data access 
(because we have two ways of moving cursor: advanceCusrosr() and. seek()), so 
next() does not make sense here. [ Note that the idiom is close to the iterator 
concept in C++ STL design, you first get a begin and end iterator from a 
container, then you can do for_each(begin, end, Op). And advanceCursor() 
corresponds to ++iter, and isCursorAtEnd corresponds to (iter == end). ]

In terms why we do not get key and value in one call, this is because we want 
to allow people to first get the key, then decide whether to read the value or 
not (considering the application of doing an inner join). But conceivably, we 
can provide various convenience utility methods to get both key and value in 
one shot (just as we did for append).

bq On the performance numbers above, how about adding in test of random 
accesses into TFiles/SequenceFiles?
Yes, we will follow up on that.

> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, 
> HADOOP-3315_20080915_TFILE.patch, TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs 
> to compress or decompress. It would be good to have a file format that only 
> needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3315) New binary file format

Reply via email to