[ 
http://issues.apache.org/jira/browse/HADOOP-441?page=comments#action_12430068 ] 
            
Arun C Murthy commented on HADOOP-441:
--------------------------------------

Doug,

  Conceptually this makes sense, I'm all for it.
  
  There is one irritant with the create methods for 
DeflaterOutputStream/InflaterInputStream in CompressionCodec... it's got to do 
with the Deflater/Inflater fields for streams respectively. Specifically the 
issue is we need to pass in our own Deflater/Inflater objects in order to be 
able to 'reset' the internal compressor/decompressor of the respective streams; 
this ability to 'reset' is otherwise absent (since we can't access those 
'protected' fields) as is crucial for SequenceFile's block compression (in the 
case where next(key,val) is followed by next(key) and block boundary is hit). 
We probably need to define our own interface for compressor/decompressor 
streams or atleast have the CompressionCodec work with an extended class of 
DeflaterOutputStream/InflaterInputStream (which will have the 'reset' 
functionality we need).

 Thoughts?

> SequenceFile should support 'custom compressors'
> ------------------------------------------------
>
>                 Key: HADOOP-441
>                 URL: http://issues.apache.org/jira/browse/HADOOP-441
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.6.0
>
>
> SequenceFiles should support 'custom compressors' which can be specified by 
> the user on creation of the file. 
> Readily available packages for gzip and zip (java.util.zip) are among obvious 
> choices to support. Of course there will be hooks so that other compressors 
> can be added in future as long as there is a way to construct (input/output) 
> streams on top of the compressor/decompressor.
> The 'classname' of the 'custom compressor/decompressor' could be stored in 
> the header of the SequenceFile which can then be used by SequenceFile.Reader 
> to figure out the appropriate 'decompressor'. Thus I propose we add 
> constructors to SequenceFile.Writer which take in the 'classname' of the 
> compressor's input/output stream classes (e.g. 
> DeflaterOutputStream/InflaterInputStream or 
> GZIPOutputStream/GZIPInputStream), which acts as the hook for future 
> compressors/decompressors.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to