[GitHub] [beam] damccorm opened a new issue, #20560: Cannot set compression level when writing compressed files

GitBox Sat, 04 Jun 2022 11:11:04 -0700


damccorm opened a new issue, #20560:
URL: https://github.com/apache/beam/issues/20560


   CompressedFile._initialize_compressor hardcodes the compression level used 
when writing:
   
    
   self._compressor = zlib.compressobj(
             zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, self._gzip_mask)
    
   It would be good to be able to control this, as I have a large set of GZIP 
compressed files that are creating output 10x larger then the input size when 
writing the same data back.
    
   I've tried various monkeypatching approaches: these seem to work with the 
local runner, but failed when using DataflowRunner. For example:
    
   class WriteData(beam.PTransform):
       def __init__(self, dst):
           import zlib
   
           self._dst = dst
   
           def _initialize_compressor(self):
               self._compressor = zlib.compressobj(
                   zlib.Z_BEST_COMPRESSION, zlib.DEFLATED, self._gzip_mask
               )
   
           CompressedFile._initialize_compressor = _initialize_compressor
   
       def expand(self, p):
           return p | WriteToText(
               file_path_prefix=self._dst,
               file_name_suffix=".tsv.gz",
               compression_type="gzip",
           )
   
   Imported from Jira 
[BEAM-11282](https://issues.apache.org/jira/browse/BEAM-11282). Original Jira 
may contain additional context.
   Reported by: JackWhelpton.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm opened a new issue, #20560: Cannot set compression level when writing compressed files

Reply via email to