[
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633920#action_12633920
]
Hong Tang commented on HADOOP-3315:
-----------------------------------
bq. We would like to leverage TFile in hbase. memcmp-sort of keys won't work
for us. Will it be hard to add support for another comparator?
Yes, we plan to add this feature later - at least for Java. But we may restrict
TFiles with such kind of comparators being accessed by Java only.
bq. On page 8., the awkward-looking for-loop at the head of the page with its
isCursorAtEnd and advanceCursor has some justification in practicality, I
presume. otherwise why not use the hasNext/next Iterator common (java) idiom?
Yes, the original consideration behind it is because the Java Iterator
interface always park the cursor on the entry that is already read, and you use
next() to move the cursor to the next and fetch the result atomically. On the
other hand, TFile scanner separates the cursor movement and data access
(because we have two ways of moving cursor: advanceCusrosr() and. seek()), so
next() does not make sense here. [ Note that the idiom is close to the iterator
concept in C++ STL design, you first get a begin and end iterator from a
container, then you can do for_each(begin, end, Op). And advanceCursor()
corresponds to ++iter, and isCursorAtEnd corresponds to (iter == end). ]
In terms why we do not get key and value in one call, this is because we want
to allow people to first get the key, then decide whether to read the value or
not (considering the application of doing an inner join). But conceivably, we
can provide various convenience utility methods to get both key and value in
one shot (just as we did for append).
bq On the performance numbers above, how about adding in test of random
accesses into TFiles/SequenceFiles?
Yes, we will follow up on that.
> New binary file format
> ----------------------
>
> Key: HADOOP-3315
> URL: https://issues.apache.org/jira/browse/HADOOP-3315
> Project: Hadoop Core
> Issue Type: New Feature
> Components: io
> Reporter: Owen O'Malley
> Assignee: Amir Youssefi
> Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch,
> HADOOP-3315_20080915_TFILE.patch, TFile Specification Final.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs
> to compress or decompress. It would be good to have a file format that only
> needs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.