[GitHub] [arrow-datafusion] alamb opened a new pull request, #4427: Expose remaining parquet config options into ConfigOptions (try 2)

GitBox Tue, 29 Nov 2022 14:20:10 -0800


alamb opened a new pull request, #4427:
URL: https://github.com/apache/arrow-datafusion/pull/4427


   this is a reworked version of 
https://github.com/apache/arrow-datafusion/pull/3885
   
   # Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-datafusion/issues/3821
   
   This also helps towards #3887 
   
    # Rationale for this change
   1. Make it easier for people to see what parquet config options are 
available will make it more likely they are used
   2. The more mechanisms that configuration is supplied, the more likely it to 
confuse people
   
   It turns out options for reading parquet files were able to be set (and 
possibly) overridden by no less than three different structures! This is 
confusing, to say the least. 
   
   
   # What changes are included in this PR?
   1. move metadata_size_hint, enable_pruning, and merge_schema_metadata to new 
config options
   2. Make the precidence of the parquet options passed down to the ParquetExec 
clear
   
   # Are there any user-facing changes?
   The main change is that now all parquet reader settings are visible session 
wide. 
   
   Previously, depending on which of the APIs was used to create / register / 
run parquet, the settings might change if you change the session config or they 
might have been a snapshot based on when you registered the reader


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb opened a new pull request, #4427: Expose remaining parquet config options into ConfigOptions (try 2)

Reply via email to