[ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668588#action_12668588 ]
stack commented on HADOOP-3315: ------------------------------- Hmm. Looking at doing random accesses and it seems like a bunch of time is spent in inBlockAdvance advancing sequentially through blocks rather than do something like a binary search to find desired block location. Also, as we advance, we create and destroy a bunch of objects such as the stream to hold the value. Can you comment on why this is (compression should be on tfile block boundaries, right so nothing to stop hopping into the midst of a tfile)? Thanks. > New binary file format > ---------------------- > > Key: HADOOP-3315 > URL: https://issues.apache.org/jira/browse/HADOOP-3315 > Project: Hadoop Core > Issue Type: New Feature > Components: io > Reporter: Owen O'Malley > Assignee: Amir Youssefi > Fix For: 0.21.0 > > Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, > HADOOP-3315_20080915_TFILE.patch, hadoop-trunk-tfile.patch, > hadoop-trunk-tfile.patch, TFile Specification 20081217.pdf > > > SequenceFile's block compression format is too complex and requires 4 codecs > to compress or decompress. It would be good to have a file format that only > needs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.