[ https://issues.apache.org/jira/browse/HADOOP-3315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668271#action_12668271 ]
stack commented on HADOOP-3315: ------------------------------- I've been playing around w/ the last patch trying to actually use it. Here are a couple of (minor) comments: Should probably be explicit about encoding in the below: public ByteArray(String str) { this(str.getBytes()); Would be nice if we could easily pass a alternate implementation of BCFile, say one that cached blocks. Do you want to fix the below: {code} // TODO: remember the longest key in a TFile, and use it to replace // MAX_KEY_SIZE. keyBuffer = new byte[MAX_KEY_SIZE]; {code} Default buffers of 64k for keys is a bit on the extravagant side. Below should be public so users don't have to define their own: protected final static String JCLASS = "jclass:"; API seems to have changed since last patch. There nolonger a #find method. Whats the suggested way of accessing a random single key/value? (Open scanner using what would you suggest for start and end? Then seekTo? But I find I'm making double ByteArray instances of same byte array. Should there be a seekTo that takes a RawComparable that is public?). Thanks Hong. > New binary file format > ---------------------- > > Key: HADOOP-3315 > URL: https://issues.apache.org/jira/browse/HADOOP-3315 > Project: Hadoop Core > Issue Type: New Feature > Components: io > Reporter: Owen O'Malley > Assignee: Amir Youssefi > Fix For: 0.21.0 > > Attachments: HADOOP-3315_20080908_TFILE_PREVIEW_WITH_LZO_TESTS.patch, > HADOOP-3315_20080915_TFILE.patch, hadoop-trunk-tfile.patch, > hadoop-trunk-tfile.patch, TFile Specification 20081217.pdf > > > SequenceFile's block compression format is too complex and requires 4 codecs > to compress or decompress. It would be good to have a file format that only > needs -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.