Re: [PR] Spark: Add session-level split size override [iceberg]

via GitHub Thu, 11 Jun 2026 13:24:04 -0700


liucao-dd commented on code in PR #16154:
URL: https://github.com/apache/iceberg/pull/16154#discussion_r3398915313



##########
docs/docs/spark-configuration.md:
##########
@@ -199,6 +199,8 @@ val spark = SparkSession.builder()
 | spark.sql.iceberg.data-planning-mode                   | AUTO                
                                           | Scan planning mode for data files 
(`AUTO`, `LOCAL`, `DISTRIBUTED`)                                                
              |
 | spark.sql.iceberg.delete-planning-mode                 | AUTO                
                                           | Scan planning mode for delete 
files (`AUTO`, `LOCAL`, `DISTRIBUTED`)                                          
                  |
 | spark.sql.iceberg.advisory-partition-size              | Table default       
                                           | Advisory size (bytes) used for 
writing to the Table when Spark's Adaptive Query Execution is enabled. Used to 
size output files |
+| spark.sql.iceberg.split-size                          | Table default        
                                          | Overrides `read.split.target-size` 
for scan planning. Session values are honored like read options and disable 
adaptive split-size adjustment |
+| spark.sql.iceberg.split-size.&lt;table-name&gt;       | Global session 
default                                         | Table-scoped split size 
override using the fully qualified table name as a suffix                       
                         |

Review Comment:
   Could we consider making table-scoped session configs a generic 
identity-first pattern instead of a split-size-specific suffix?
   
   The current shape, `spark.sql.iceberg.split-size.<table-name>`, puts the 
setting before the table identity. That works for this one config, but it is 
harder to generalize and can be confusing once more session-backed settings 
become table-scoped.
   
   An alternative is to resolve table-scoped session configs as:
   
   ` spark.sql.iceberg.<catalog>.<namespace...>.<table>.<setting-suffix>`
   
   For this config, if the global key becomes 
`spark.sql.iceberg.read.split-size`, the table-scoped key would be:
   
   ` spark.sql.iceberg.<catalog>.<namespace...>.<table>.read.split-size`
   
   That keeps the table identity together, then applies the same setting suffix 
used by the global session config. It should also avoid reverse-parsing 
ambiguity if the key is constructed from the resolved catalog + Spark 
`Identifier` and looked up exactly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Spark: Add session-level split size override [iceberg]

Reply via email to