[ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668538#action_12668538 ]
stack commented on HADOOP-3315: ------------------------------- On an alternate BCFile implementation, we're thinking that since BCFile currently fetches a block as needed, an alternative might keep up a cache of blocks. Before going to the wire, the cache would be checked to save on hdfs'ing (and repeated decompresses if blocks are compressed). If many concurrent accessors contending over the same part of a TFile, we'd save a bunch. We've had some experience doing this in the past with generally good results. But rather than make hypotheticals, lets wait as you suggest and revisit when a patch with a viable alternative BCFile exists (On key/value caching, we're working on that too only it can be a tad complex in the hbase context). OK on the too-big-key. Any advantage to our making a scanner around a start and end key random accessing or, if I read things properly, there is none since we only fetch actual blocks when seekTo is called. And on concurrent access, if we have say, random-accesses concurrent with a couple of whole-file scans my reading has it that scanners fetch a block just as it needs it and then works against this fetched copy. The fetch is 'synchronized' which means lots of seeking around in the file but otherwise, it looks like there is no need for the application to synchronize access to tfile. Thanks for the sweet patch. Its looking like we'll pull TFile into hbase and start using it tout de suite since our intent is to replace our current storefile in the 0.20.0 timeframe. Hopefully we'll have some patches for TFile to give back once this patch goes into hadopo. Are you going to upload another patch? If so, I'll keep my +1 for that. > New binary file format > ---------------------- > > Key: HADOOP-3315 > URL: https://issues.apache.org/jira/browse/HADOOP-3315 > Project: Hadoop Core > Issue Type: New Feature > Components: io > Reporter: Owen O'Malley > Assignee: Amir Youssefi > Fix For: 0.21.0 > > Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, > HADOOP-3315_20080915_TFILE.patch, hadoop-trunk-tfile.patch, > hadoop-trunk-tfile.patch, TFile Specification 20081217.pdf > > > SequenceFile's block compression format is too complex and requires 4 codecs > to compress or decompress. It would be good to have a file format that only > needs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.