wombatu-kun commented on PR #16154:
URL: https://github.com/apache/iceberg/pull/16154#issuecomment-4628042882

   Context for reviewers - this capability has prior history worth reconciling 
before a deep review.
   
   The same feature was proposed twice before and closed without merging 
(#10336 and its revival #13677). On #10336, @szehon-ho objected to a bare 
global `spark.sql.iceberg.split-size` key on design grounds: "The problem with 
this is that it does lead to some ambiguity as to what table the config is 
applying to (many queries read from several tables, for example)" 
(https://github.com/apache/iceberg/pull/10336#issuecomment-2182038718). He 
pointed at per-scan SQL OPTIONS on the Spark side as the alternative 
(https://github.com/apache/spark/pull/46707) and later noted it had become 
possible there 
(https://github.com/apache/iceberg/pull/10336#issuecomment-2219566750).
   
   This PR's table-scoped key (`spark.sql.iceberg.split-size.<table-name>`) 
directly answers that multi-table ambiguity. The global tier 
(`spark.sql.iceberg.split-size`) it also adds reintroduces the same ambiguity 
for queries reading several tables, so a maintainer call on whether the global 
tier should stay would help before deeper review.
   
   There is also a competing open PR #16185 proposing the same capability with 
different naming (`spark.sql.iceberg.split.target-size`, plus lookback / 
open-file-cost / adaptive knobs, mirroring the table-property names). It is 
currently stale. Worth deciding which naming and scope to converge on before 
either is reviewed in depth.
   
   Two related notes for whoever picks this up:
   - whether per-scan SQL OPTIONS (now available in Spark) already cover the 
original use case is worth re-confirming.
   - the resolution change here is generic - it adds the `.<table-name>` 
table-scoped suffix to every `spark.sql.iceberg.*` session conf, not only 
split-size.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to