[
http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12428716 ]
Doug Cutting commented on HADOOP-441:
-------------------------------------
The constructors should probably take class instances rather than class names.
Codecs should be based on DeflaterOutputStream and InflaterInputStream, but it
would be best to write just one name to the file. So we might add a compressor
factory interface like:
public interface CompressionCodec extends Configurable {
DeflaterOutputStream createDeflaterOutputStream(OutputStream out);
InflaterInputStream createInflaterInputStream(InputStream in);
}
Then the constructors would take an instance of this interface and write the
name of that class into the file. Implementations would be required to provide
a public default constructor.
We might also add methods like the following to this interface:
void writeVersion(DataOutputStream out);
void readVersion(DataInputStream in) throws VersionMismatchException;
That would permit folks to safely revise a codec without having to use a new
class name.
> SequenceFile should support 'custom compressors'
> ------------------------------------------------
>
> Key: HADOOP-441
> URL: http://issues.apache.org/jira/browse/HADOOP-441
> Project: Hadoop
> Issue Type: New Feature
> Components: io
> Reporter: Arun C Murthy
> Assigned To: Arun C Murthy
> Fix For: 0.6.0
>
>
> SequenceFiles should support 'custom compressors' which can be specified by
> the user on creation of the file.
> Readily available packages for gzip and zip (java.util.zip) are among obvious
> choices to support. Of course there will be hooks so that other compressors
> can be added in future as long as there is a way to construct (input/output)
> streams on top of the compressor/decompressor.
> The 'classname' of the 'custom compressor/decompressor' could be stored in
> the header of the SequenceFile which can then be used by SequenceFile.Reader
> to figure out the appropriate 'decompressor'. Thus I propose we add
> constructors to SequenceFile.Writer which take in the 'classname' of the
> compressor's input/output stream classes (e.g.
> DeflaterOutputStream/InflaterInputStream or
> GZIPOutputStream/GZIPInputStream), which acts as the hook for future
> compressors/decompressors.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira