[
https://issues.apache.org/jira/browse/HADOOP-8003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196283#comment-13196283
]
Tim Broberg commented on HADOOP-8003:
-------------------------------------
Agreed, as compatible as possible while minimizing any increased complexity in
the interface is best.
My simplest and least invasive idea is this:
1 - Have SplittableCompressionCodec's createInputStream() return a
CompressionInputStream instead of a SplitCompressionInputStream.
2 - Redefine SplitCompressionInputStream to be an interface instead of an
abstract class.
3 - Require that all CompressionInputStreams returned by this
createInputStream() method implement SplitCompressionInputStream.
4 - Modify bzip to conform to the above.
5 - (optional) applications may check that #3 is obeyed.
Benefits:
1 - The application doesn't have to change at all. If a codec is an instance
of SplittableCompressionCodec, call the appropriate createInputStream function
and use the resulting stream as before.
2 - No duplicate classes or interfaces are introduced to confuse hapless
developers.
3 - New splittable codecs can extend any CompressionInputStream they like.
Can anybody describe an approach (or improvement to this one) that is less
disruptive and/or simpler?
> Make SplitCompressionInputStream an interface instead of an abstract class
> --------------------------------------------------------------------------
>
> Key: HADOOP-8003
> URL: https://issues.apache.org/jira/browse/HADOOP-8003
> Project: Hadoop Common
> Issue Type: New Feature
> Components: io
> Affects Versions: 0.21.0, 0.22.0, 0.23.0, 1.0.0
> Reporter: Tim Broberg
>
> To be splittable, a codec must extend SplittableCompressionCodec which has a
> function returning a SplitCompressionInputStream.
> SplitCompressionInputStream is an abstract class which extends
> CompressionInputStream, the lowest level compression stream class.
> So, no codec that wants to be splittable can reuse any code from
> DecompressorStream or BlockDecompressorStream.
> You either have to duplicate that code, or not be splittable.
> SplitCompressionInputStream adds just a few very thin functions. Can we make
> this an interface rather than an abstract class to allow splittable
> decompression streams to extend DecompressorStream, BlockDecompressorStream,
> or whatever else we should scheme up in the future?
> To my knowledge, this would impact only the BZip2 codec. None of the other
> implement this form of splittability yet.
> LineRecordReader looks only at whether the codec is an instance of
> SplittableCompressionCodec, and then calls the appropriate version of
> createInputStream. This would not change, so the application code should not
> have to change, just BZip and SplitCompressionInputStream.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira