[jira] Commented: (HADOOP-441) SequenceFile should support 'custom compressors'

Arun C Murthy (JIRA) Mon, 28 Aug 2006 04:35:53 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12430967 ] 
            
Arun C Murthy commented on HADOOP-441:
--------------------------------------


Wrt to the new Compression{Input|Output}Stream interfaces proposed by Owen, 
here are some thoughts and alternatives...

(Since then a new idea is to have the the above 'compression streams' implement 
the Data{Input|Output} interfaces, so that they can be passed along to the 
write/readFields methods of Writable objects i.e. bridge a 'stream' with 
Data{Input|Output})

a) 

public abstract class CompressionOutputStream extends DataOutputStream {
  public abstract int write();
  public abstract int write(byte[], int, int);
  public abstract void resetCompressionState(); // 'reset'
  public abstract void finish(); // Finishes writing compressed data to the 
output stream without closing the underlying stream.
}

Here we let DataOutputStream's other public methods (writeBoolean, writeInt 
etc.) be as-is, based on assumption that they all _will_ internally call the 
two abstract 'write' methods which correctly 'compress'. (Valid assumption on 
all jvms and on all platforms across versions?)

Since DataInputStream's 'read' is marked 'final':
public abstract class CompressionInputStream implements DataInput {
  public abstract int read();
  public abstract int read(byte[], int, int);
  public abstract void resetCompressionState(); // 'reset'

  // Other interfaces of DataInput are provided concrete implementations
}


b) 

Same CompressionInputStream but get CompressionOutputStream to implement 
DataOutput instead of DataOutputStream to maintain symmetry - this approach has 
the drawback that we will need to provide concrete implementations of other 
public interfaces of DataOutput; only maintaining symmetry.


c)

 To provide a 'true bridge' between streams and Writables we can create other 
classes:

public abstract class WritableOutputStream implements DataOutput (extends 
DataOutputStream) {
}

public abstract class WritableInputStream implements DataInput {
}

public class CompressionOutputStream extends  WritableOutputStream {
}

public class CompressionInputStream extends DataInput WritableInputStream {
}

 Thus it will provide a more general bridge between streams and Writables and 
enable other {In|Out}putStream implementations in future. (This can also be a 
separate issue...)


Thoughts?

> SequenceFile should support 'custom compressors'
> ------------------------------------------------
>
>                 Key: HADOOP-441
>                 URL: http://issues.apache.org/jira/browse/HADOOP-441
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.6.0
>
>
> SequenceFiles should support 'custom compressors' which can be specified by 
> the user on creation of the file. 
> Readily available packages for gzip and zip (java.util.zip) are among obvious 
> choices to support. Of course there will be hooks so that other compressors 
> can be added in future as long as there is a way to construct (input/output) 
> streams on top of the compressor/decompressor.
> The 'classname' of the 'custom compressor/decompressor' could be stored in 
> the header of the SequenceFile which can then be used by SequenceFile.Reader 
> to figure out the appropriate 'decompressor'. Thus I propose we add 
> constructors to SequenceFile.Writer which take in the 'classname' of the 
> compressor's input/output stream classes (e.g. 
> DeflaterOutputStream/InflaterInputStream or 
> GZIPOutputStream/GZIPInputStream), which acts as the hook for future 
> compressors/decompressors.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-441) SequenceFile should support 'custom compressors'

Reply via email to