[ 
https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668538#action_12668538
 ] 

stack commented on HADOOP-3315:
-------------------------------

On an alternate BCFile implementation, we're thinking that since BCFile 
currently fetches a block as needed, an alternative might keep up a cache of 
blocks.  Before going to the wire, the cache would be checked to save on 
hdfs'ing (and repeated decompresses if blocks are compressed).  If many 
concurrent accessors contending over the same part of a TFile, we'd save a 
bunch.  We've had some experience doing this in the past with generally good 
results.

But rather than make hypotheticals, lets wait as you suggest and revisit when a 
patch with a viable alternative BCFile exists (On key/value caching, we're 
working on that too only it can be a tad complex in the hbase context).

OK on the too-big-key.

Any advantage to our making a scanner around a start and end key random 
accessing or, if I read things properly, there is none since we only fetch 
actual blocks when seekTo is called.

And on concurrent access, if we have say, random-accesses concurrent with a 
couple of whole-file scans my reading has it that scanners fetch a block just 
as it needs it and then works against this fetched copy.  The fetch is 
'synchronized' which means lots of seeking around in the file but otherwise, it 
looks like there is no need for the application to synchronize access to tfile.

Thanks for the sweet patch.  Its looking like we'll pull TFile into hbase and 
start using it tout de suite since our intent is to replace our current 
storefile in the 0.20.0 timeframe.  Hopefully we'll have some patches for TFile 
to give back once this patch goes into hadopo.  Are you going to upload another 
patch?  If so, I'll keep my +1 for that.


> New binary file format
> ----------------------
>
>                 Key: HADOOP-3315
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3315
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>            Assignee: Amir Youssefi
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, 
> HADOOP-3315_20080915_TFILE.patch, hadoop-trunk-tfile.patch, 
> hadoop-trunk-tfile.patch, TFile Specification 20081217.pdf
>
>
> SequenceFile's block compression format is too complex and requires 4 codecs 
> to compress or decompress. It would be good to have a file format that only 
> needs 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to