rdblue commented on pull request #3810:
URL: https://github.com/apache/iceberg/pull/3810#issuecomment-1061990622
For Parquet the Hadoop configuration is used to pass options into the data
file. It is not used as the source of table configuration. Table configuration
properties should never come from the Hadoop Configuration.
The steps should be:
1. Get the Hadoop configuration, if it is present in the InputFile or
OutputFile. If not, default it
2. Get config from the builder and the table properties
3. Set configuration from step 2 on the Hadoop config
4. Create the reader or writer with the Hadoop config
The only configuration coming from the Hadoop Configuration itself is
whatever was in the environment.
One possibly confusing thing is that we also set config values directly in
the Hadoop configuration. That handles cases where the user wants to pass
config properties that are not standardized in Iceberg. So you could use
`set("parquet.bloom.filter.enabled#id", "true")` for example. Standardized
table settings should override these.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]