Baunsgaard commented on PR #16018:
URL: https://github.com/apache/iceberg/pull/16018#issuecomment-4509648848
Hi @lrsb , thanks for your first PR to Iceberg!
I have some concerns about the PR though.
The prefix-strip, Hadoop conf mechanism has a clear precedent in
Spark's `spark.sql.catalog.<name>.hadoop.*`
(SparkUtil.hadoopConfCatalogOverrides),
which is also an unbounded receiver it sets whatever key/value pairs match
the prefix without filtering.
However to your credit: There is a structural difference on the source side.
Spark's mechanism reads from SQLConf, populated only from local sources
(spark-defaults.conf, CLI flags, programmatic config, SQL SET). No remote
party can push into it, making it harder to maintain a consistent
coordinated.
This PR reads from `FileIO.initialize(properties)`, which in the
`RESTSessionCatalog` path is the result of `ConfigResponse.merge()`,
so it can include the server's `/v1/config` `overrides` map.
**However, introducing this feature is a meaningful design choice
that probably deserves explicit discussion on the dev list before
locking it in as the de facto remote convention for
catalog-property to Hadoop-conf propagation:**
1. It becomes a default convention once shipped: a REST catalog that
starts pushing `iceberg.hadoop.conf.fs.s3a.endpoint=...` is
compatible with any client that uses this feature, but
reverting the feature later is hard.
2. Closing it down (e.g. via allowlist) after the fact is a
compat break; opening it up later is not.
Three options worth discussing, ordered from most-permissive to least:
a) Ship as-is, but document the open-ended nature in the Javadoc
on the new constant and in the docs page. (Not recommended from my side)
b) Mirror S3FileIOProperties: curate an explicit list of safe
keys (`fs.s3a.endpoint`, `fs.azure.account.key.*`, etc.) as
named constants, accept those, ignore others.
c) Keep the prefix open-ended but make it opt-in via a client-set
flag (e.g. `iceberg.hadoop.conf.enabled=true`).
I lean toward (c) for shipping if you have a real use case plus a follow-up
to discuss (b).
**But before any of that I think this PR really needs a design discussion on
the dev list.**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]