[ http://issues.apache.org/jira/browse/HADOOP-441?page=all ]
Work on HADOOP-441 started by Arun C Murthy. > SequenceFile should support 'custom compressors' > ------------------------------------------------ > > Key: HADOOP-441 > URL: http://issues.apache.org/jira/browse/HADOOP-441 > Project: Hadoop > Issue Type: New Feature > Components: io > Reporter: Arun C Murthy > Assigned To: Arun C Murthy > Fix For: 0.6.0 > > > SequenceFiles should support 'custom compressors' which can be specified by > the user on creation of the file. > Readily available packages for gzip and zip (java.util.zip) are among obvious > choices to support. Also 'bmdiff' seems a good candidate for adding support > for. Of course there will be hooks so that other compressors can be added in > future as long as there is a way to construct (input/output) streams on top > of the compressor/decompressor. > The 'classname' of the 'custom compressor/decompressor' could be stored in > the header of the SequenceFile which can then be used by SequenceFile.Reader > to figure out the appropriate 'decompressor'. Thus I propose we add > constructors to SequenceFile.Writer which take in the 'classname' of the > compressor's input/output stream classes (e.g. > DeflaterOutputStream/InflaterInputStream or > GZIPOutputStream/GZIPInputStream), which acts as the hook for future > compressors/decompressors. > Looks like there isn't a java library for bmdiff (I'd love to be corrected on > this)... thoughts on how to go about this? A JNI wrapper on top of a C api? > If so how difficult does hadoop-dev think it is to implement a input/output > stream on top of this? Alternatives? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
