[
http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12430967 ]
Arun C Murthy commented on HADOOP-441:
--------------------------------------
Wrt to the new Compression{Input|Output}Stream interfaces proposed by Owen,
here are some thoughts and alternatives...
(Since then a new idea is to have the the above 'compression streams' implement
the Data{Input|Output} interfaces, so that they can be passed along to the
write/readFields methods of Writable objects i.e. bridge a 'stream' with
Data{Input|Output})
a)
public abstract class CompressionOutputStream extends DataOutputStream {
public abstract int write();
public abstract int write(byte[], int, int);
public abstract void resetCompressionState(); // 'reset'
public abstract void finish(); // Finishes writing compressed data to the
output stream without closing the underlying stream.
}
Here we let DataOutputStream's other public methods (writeBoolean, writeInt
etc.) be as-is, based on assumption that they all _will_ internally call the
two abstract 'write' methods which correctly 'compress'. (Valid assumption on
all jvms and on all platforms across versions?)
Since DataInputStream's 'read' is marked 'final':
public abstract class CompressionInputStream implements DataInput {
public abstract int read();
public abstract int read(byte[], int, int);
public abstract void resetCompressionState(); // 'reset'
// Other interfaces of DataInput are provided concrete implementations
}
b)
Same CompressionInputStream but get CompressionOutputStream to implement
DataOutput instead of DataOutputStream to maintain symmetry - this approach has
the drawback that we will need to provide concrete implementations of other
public interfaces of DataOutput; only maintaining symmetry.
c)
To provide a 'true bridge' between streams and Writables we can create other
classes:
public abstract class WritableOutputStream implements DataOutput (extends
DataOutputStream) {
}
public abstract class WritableInputStream implements DataInput {
}
public class CompressionOutputStream extends WritableOutputStream {
}
public class CompressionInputStream extends DataInput WritableInputStream {
}
Thus it will provide a more general bridge between streams and Writables and
enable other {In|Out}putStream implementations in future. (This can also be a
separate issue...)
Thoughts?
> SequenceFile should support 'custom compressors'
> ------------------------------------------------
>
> Key: HADOOP-441
> URL: http://issues.apache.org/jira/browse/HADOOP-441
> Project: Hadoop
> Issue Type: New Feature
> Components: io
> Reporter: Arun C Murthy
> Assigned To: Arun C Murthy
> Fix For: 0.6.0
>
>
> SequenceFiles should support 'custom compressors' which can be specified by
> the user on creation of the file.
> Readily available packages for gzip and zip (java.util.zip) are among obvious
> choices to support. Of course there will be hooks so that other compressors
> can be added in future as long as there is a way to construct (input/output)
> streams on top of the compressor/decompressor.
> The 'classname' of the 'custom compressor/decompressor' could be stored in
> the header of the SequenceFile which can then be used by SequenceFile.Reader
> to figure out the appropriate 'decompressor'. Thus I propose we add
> constructors to SequenceFile.Writer which take in the 'classname' of the
> compressor's input/output stream classes (e.g.
> DeflaterOutputStream/InflaterInputStream or
> GZIPOutputStream/GZIPInputStream), which acts as the hook for future
> compressors/decompressors.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira