[GitHub] [arrow-datafusion] devinjdangelo opened a new pull request, #7466: Support Configuring Parquet Column Specific Options via SQL Statement Options

via GitHub Sat, 02 Sep 2023 17:51:18 -0700


devinjdangelo opened a new pull request, #7466:
URL: https://github.com/apache/arrow-datafusion/pull/7466


   ## Which issue does this PR close?
   
   Closes #7442 
   Closes #7463 (this PR implements a different syntax than 7463 which turned 
out much easier to implement. Leaving 7463 draft open to show as possible 
alternative)
   
   ## Rationale for this change
   
   Extends syntax and allowed options for SQL statement options to configure 
parquet column level options (e.g. different compression for each possibly 
nested column).
   
   ## What changes are included in this PR?
   
   Implements new parsing utils and options for parquet column specific 
options. Example:
   
   ```sql
   copy my_table
   to my_file.parquet
   (compression snappy
   'compression::col1' 'zstd(5)',
   'compression::col2.nested' 'zstd(10)'
   ```
   
   The example defaults all columns to snappy compression and sets col1 to zstd 
level 5 and the nested column (col2.nested) to zstd level 10.
   
   ## Are these changes tested?
   
   Yes, added a new unit test to verify settings are actually set as expected.
   
   ## Are there any user-facing changes?
   
   New syntax for specifying column level options, but this PR is backward 
compatible/no breaking changes unlike 7463


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] devinjdangelo opened a new pull request, #7466: Support Configuring Parquet Column Specific Options via SQL Statement Options

Reply via email to