[GitHub] [iceberg] rdblue commented on pull request #3810: ORC: Add configurable write properties

GitBox Tue, 08 Mar 2022 08:54:59 -0800


rdblue commented on pull request #3810:
URL: https://github.com/apache/iceberg/pull/3810#issuecomment-1061990622



   For Parquet the Hadoop configuration is used to pass options into the data 
file. It is not used as the source of table configuration. Table configuration 
properties should never come from the Hadoop Configuration.
   
   The steps should be:
   1. Get the Hadoop configuration, if it is present in the InputFile or 
OutputFile. If not, default it
   2. Get config from the builder and the table properties
   3. Set configuration from step 2 on the Hadoop config
   4. Create the reader or writer with the Hadoop config
   
   The only configuration coming from the Hadoop Configuration itself is 
whatever was in the environment.
   
   One possibly confusing thing is that we also set config values directly in 
the Hadoop configuration. That handles cases where the user wants to pass 
config properties that are not standardized in Iceberg. So you could use 
`set("parquet.bloom.filter.enabled#id", "true")` for example. Standardized 
table settings should override these.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on pull request #3810: ORC: Add configurable write properties

Reply via email to