SequenceFile should support 'custom compressors'
------------------------------------------------
Key: HADOOP-441
URL: http://issues.apache.org/jira/browse/HADOOP-441
Project: Hadoop
Issue Type: New Feature
Components: io
Reporter: Arun C Murthy
Assigned To: Arun C Murthy
Fix For: 0.6.0
SequenceFiles should support 'custom compressors' which can be specified by the
user on creation of the file.
Readily available packages for gzip and zip (java.util.zip) are among obvious
choices to support. Also 'bmdiff' seems a good candidate for adding support
for. Of course there will be hooks so that other compressors can be added in
future as long as there is a way to construct (input/output) streams on top of
the compressor/decompressor.
The 'classname' of the 'custom compressor/decompressor' could be stored in the
header of the SequenceFile which can then be used by SequenceFile.Reader to
figure out the appropriate 'decompressor'. Thus I propose we add constructors
to SequenceFile.Writer which take in the 'classname' of the compressor's
input/output stream classes (e.g. DeflaterOutputStream/InflaterInputStream or
GZIPOutputStream/GZIPInputStream), which acts as the hook for future
compressors/decompressors.
Looks like there isn't a java library for bmdiff (I'd love to be corrected on
this)... thoughts on how to go about this? A JNI wrapper on top of a C api? If
so how difficult does hadoop-dev think it is to implement a input/output stream
on top of this? Alternatives?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira