[ I'll reply a little bit and leave the details to Dan. ]

First, Frederick, welcome! We look forward to your contributions to Beam.

On a first glance, BEAM-64 was a little under-specified. Let me try to
clarify what was intended:
* Add a pipeline-level registry of compression formats with a corresponding
logic to compress/decompress. This is perhaps somewhat similar design to
CoderRegistry.
* Remove the current logic from CompressedSource, but keep the ability to
override the registry.
* Propagate the ability to override the registry to the users of
CompressedSource, one of which is TextIO.

>From the user perspective, the experience would be as follows:
* Add custom compressed formats to the registry, just after creating the
pipeline.
* Use any (applicable) IO without any special considerations. Compression
is handled automatically by the filename extension.
* Alternatively, override the compression format at any source / sink.

Does this make sense?

On Sun, May 22, 2016 at 3:01 AM, Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi Frederick,
>
> thanks for the update. We gonna take a look.
>
> Thanks !
> Regards
> JB
>
>
> On 05/21/2016 08:21 PM, Frederick Kautz wrote:
>
>> I impemented a potential solution to "[BEAM-64] General decompression
>> registry". It still needs a bit more attention with some of the finer
>> details, e.g. better error handling, better javadocs, adding unit tests.
>>
>> However, before I spend more time on it, I would like a review of the
>> general design.
>>
>>
>> https://github.com/apache/incubator-beam/compare/master...fkautz:beam-64?expand=1
>>
>> Design:
>>
>> I attempted to implement an approach that would require no code changes to
>> the users. There is an SDK interface change, but it should be backwards
>> compatible with existing code.
>>
>> TextIO.withCompression() is now capable of receiving a generic compressor
>> operator which includes all of the enums from before (AUTO, UNCOMPRESSED,
>> GZIP, BZIP2) but now can also receive a user or library implemented
>> compressor.
>>
>> CompressionType also receives a new getRegistry() which allows the user to
>> customize the behavior of AUTO. It allows the user to add, replace or
>> remove registered compressors as necessary.
>>
>> Here's a short list of changes:
>>
>> * Create a new CompressorOperator, compatible with Java 8 lambda
>> * CompressionType enum now implements CompressorType
>> * withCompression now takes a CompressorOperator
>> * Compression wrappers implementations moved from in-line code to
>> CompressionType enum
>> * Compression registry created
>> * AUTO now supports compressors registered with the registry
>>
>> Can someone review the design and give me feedback? If the design looks
>> good, I'll move forward on implementing tests, better exception error
>> messages, and improve the javadocs.
>>
>> Thanks,
>> Frederick
>>
>>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to