dzamo opened a new pull request #2321:
URL: https://github.com/apache/drill/pull/2321


   # [DRILL-7969](https://issues.apache.org/jira/browse/DRILL-7969): 
DRILL-7969: Read and write Parquet with brotli, lzo, lz4, zstd
   
   ## Description
   
   Adds support for all the standardised Parquet compression codecs beyond GZip
   and Snappy by making use of the airlift/aircompressor library with a fallback
   to parquet-mr for compression for codecs not implemented in parquet-mr.
   
   A new, delegating CompressionCodecFactory implementation is included.  This
   handles the routing of (de)compression to the correct lib while having a
   minimal impact on the calling code in the Parquet reading and writing parts
   of the Drill codebase.
   
   ## Documentation
   
   New codec options available in for selection by users in 
`store.parquet.compression`.  I'll look at the Drill docs to see if there are 
any pages discussing Parquet compression codecs that can be updates.
   
   ## Testing
   
   New unit tests for each codec in 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java
   
   Test CTAS then SELECT with each new codec in drill-embedded.
   
   Use pyarrow to read output Parquet metadata and check reported codec is 
correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to