[ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HADOOP-3315: -------------------------- Attachment: hfile2.patch More stripping. This patch has HFile sort of working again (Its a hackup with ugly byte array copies that we need to remove). I was able to do some basic performance comparisons. If buffer size is 4k, then I can random access 10 byte cells as fast a MapFile. If cells are bigger, HFile outperforms MapFile; e.g. if cell is 100 bytes, HFile is 2x MapFile (These are extremely coarse tests going against local filesystem). Need to do more stripping. In particular implement Ryan Rawson idea of carrying HFile block in an nio ByteBuffer giving out new ByteBuffer 'views' when a key or value is asked for rather than copy byte arrays. > New binary file format > ---------------------- > > Key: HADOOP-3315 > URL: https://issues.apache.org/jira/browse/HADOOP-3315 > Project: Hadoop Core > Issue Type: New Feature > Components: io > Reporter: Owen O'Malley > Assignee: Hong Tang > Fix For: 0.21.0 > > Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, > HADOOP-3315_20080915_TFILE.patch, hadoop-trunk-tfile.patch, > hadoop-trunk-tfile.patch, hfile2.patch, TFile Specification 20081217.pdf > > > SequenceFile's block compression format is too complex and requires 4 codecs > to compress or decompress. It would be good to have a file format that only > needs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.