[ I'll reply a little bit and leave the details to Dan. ] First, Frederick, welcome! We look forward to your contributions to Beam.
On a first glance, BEAM-64 was a little under-specified. Let me try to clarify what was intended: * Add a pipeline-level registry of compression formats with a corresponding logic to compress/decompress. This is perhaps somewhat similar design to CoderRegistry. * Remove the current logic from CompressedSource, but keep the ability to override the registry. * Propagate the ability to override the registry to the users of CompressedSource, one of which is TextIO. >From the user perspective, the experience would be as follows: * Add custom compressed formats to the registry, just after creating the pipeline. * Use any (applicable) IO without any special considerations. Compression is handled automatically by the filename extension. * Alternatively, override the compression format at any source / sink. Does this make sense? On Sun, May 22, 2016 at 3:01 AM, Jean-Baptiste Onofré <[email protected]> wrote: > Hi Frederick, > > thanks for the update. We gonna take a look. > > Thanks ! > Regards > JB > > > On 05/21/2016 08:21 PM, Frederick Kautz wrote: > >> I impemented a potential solution to "[BEAM-64] General decompression >> registry". It still needs a bit more attention with some of the finer >> details, e.g. better error handling, better javadocs, adding unit tests. >> >> However, before I spend more time on it, I would like a review of the >> general design. >> >> >> https://github.com/apache/incubator-beam/compare/master...fkautz:beam-64?expand=1 >> >> Design: >> >> I attempted to implement an approach that would require no code changes to >> the users. There is an SDK interface change, but it should be backwards >> compatible with existing code. >> >> TextIO.withCompression() is now capable of receiving a generic compressor >> operator which includes all of the enums from before (AUTO, UNCOMPRESSED, >> GZIP, BZIP2) but now can also receive a user or library implemented >> compressor. >> >> CompressionType also receives a new getRegistry() which allows the user to >> customize the behavior of AUTO. It allows the user to add, replace or >> remove registered compressors as necessary. >> >> Here's a short list of changes: >> >> * Create a new CompressorOperator, compatible with Java 8 lambda >> * CompressionType enum now implements CompressorType >> * withCompression now takes a CompressorOperator >> * Compression wrappers implementations moved from in-line code to >> CompressionType enum >> * Compression registry created >> * AUTO now supports compressors registered with the registry >> >> Can someone review the design and give me feedback? If the design looks >> good, I'll move forward on implementing tests, better exception error >> messages, and improve the javadocs. >> >> Thanks, >> Frederick >> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
