Baunsgaard commented on PR #16018:
URL: https://github.com/apache/iceberg/pull/16018#issuecomment-4509648848

   Hi @lrsb , thanks for your first PR to Iceberg!
   
   I have some concerns about the PR though.
   
   The prefix-strip, Hadoop conf mechanism has a clear precedent in
   Spark's `spark.sql.catalog.<name>.hadoop.*` 
(SparkUtil.hadoopConfCatalogOverrides), 
   which is also an unbounded receiver it sets whatever key/value pairs match 
the prefix without filtering.
   
   However to your credit: There is a structural difference on the source side.
   Spark's mechanism reads from SQLConf, populated only from local sources 
   (spark-defaults.conf, CLI flags, programmatic config, SQL SET). No remote
   party can push into it, making it harder to maintain a consistent 
coordinated.
   
   This PR reads from `FileIO.initialize(properties)`, which in the
   `RESTSessionCatalog` path is the result of `ConfigResponse.merge()`,
   so it can include the server's `/v1/config` `overrides` map.
   
   **However, introducing this feature is a meaningful design choice
   that probably deserves explicit discussion on the dev list before
   locking it in as the de facto remote convention for
   catalog-property to Hadoop-conf propagation:**
   
     1. It becomes a default convention once shipped: a REST catalog that
        starts pushing `iceberg.hadoop.conf.fs.s3a.endpoint=...` is
        compatible with any client that uses this feature, but
        reverting the feature later is hard.
     2. Closing it down (e.g. via allowlist) after the fact is a
        compat break; opening it up later is not.
   
   Three options worth discussing, ordered from most-permissive to least:
   
     a) Ship as-is, but document the open-ended nature in the Javadoc
        on the new constant and in the docs page. (Not recommended from my side)
     b) Mirror S3FileIOProperties: curate an explicit list of safe
        keys (`fs.s3a.endpoint`, `fs.azure.account.key.*`, etc.) as
        named constants, accept those, ignore others.
     c) Keep the prefix open-ended but make it opt-in via a client-set
        flag (e.g. `iceberg.hadoop.conf.enabled=true`).
   
   I lean toward (c) for shipping if you have a real use case plus a follow-up 
to discuss (b).
   
   **But before any of that I think this PR really needs a design discussion on 
the dev list.**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to