[ 
https://issues.apache.org/jira/browse/HADOOP-8003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196283#comment-13196283
 ] 

Tim Broberg commented on HADOOP-8003:
-------------------------------------

Agreed, as compatible as possible while minimizing any increased complexity in 
the interface is best.

My simplest and least invasive idea is this:
 1 - Have  SplittableCompressionCodec's createInputStream() return a 
CompressionInputStream instead of a SplitCompressionInputStream.
 2 - Redefine SplitCompressionInputStream to be an interface instead of an 
abstract class.
 3 - Require that all CompressionInputStreams returned by this 
createInputStream() method implement SplitCompressionInputStream.
 4 - Modify bzip to conform to the above.
 5 - (optional) applications may check that #3 is obeyed.

Benefits:
 1 - The application doesn't have to change at all. If a codec is an instance 
of SplittableCompressionCodec, call the appropriate createInputStream function 
and use the resulting stream as before.
 2 - No duplicate classes or interfaces are introduced to confuse hapless 
developers.
 3 - New splittable codecs can extend any CompressionInputStream they like.

Can anybody describe an approach (or improvement to this one) that is less 
disruptive and/or simpler?
                
> Make SplitCompressionInputStream an interface instead of an abstract class
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-8003
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8003
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io
>    Affects Versions: 0.21.0, 0.22.0, 0.23.0, 1.0.0
>            Reporter: Tim Broberg
>
> To be splittable, a codec must extend SplittableCompressionCodec which has a 
> function returning a SplitCompressionInputStream.
> SplitCompressionInputStream is an abstract class which extends 
> CompressionInputStream, the lowest level compression stream class.
> So, no codec that wants to be splittable can reuse any code from 
> DecompressorStream or BlockDecompressorStream.
> You either have to duplicate that code, or not be splittable.
> SplitCompressionInputStream adds just a few very thin functions. Can we make 
> this an interface rather than an abstract class to allow splittable 
> decompression streams to extend DecompressorStream, BlockDecompressorStream, 
> or whatever else we should scheme up in the future?
> To my knowledge, this would impact only the BZip2 codec. None of the other 
> implement this form of splittability yet.
> LineRecordReader looks only at whether the codec is an instance of 
> SplittableCompressionCodec, and then calls the appropriate version of 
> createInputStream. This would not change, so the application code should not 
> have to change, just BZip and SplitCompressionInputStream.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to