Re: [PR] Docs: Clarify Spark `spark.sql.adaptive.advisoryPartitionSizeInBytes` [iceberg]

via GitHub Mon, 08 Jun 2026 02:03:54 -0700


pan3793 commented on code in PR #16721:
URL: https://github.com/apache/iceberg/pull/16721#discussion_r3371936555



##########
docs/docs/spark-writes.md:
##########
@@ -458,8 +458,9 @@ or manually repartition the data.
 To adjust Spark's task size it is important to become familiar with Spark's 
various Adaptive Query Execution (AQE)
 parameters. When the `write.distribution-mode` is not `none`, AQE will control 
the coalescing and splitting of Spark
 tasks during the exchange to try to create tasks of 
`spark.sql.adaptive.advisoryPartitionSizeInBytes` size. These
-settings will also affect any user performed re-partitions or sorts.
-It is important again to note that this is the in-memory Spark row size and 
not the on disk
-columnar-compressed size, so a larger value than the target file size will 
need to be specified. The ratio of
-in-memory size to on disk size is data dependent. Future work in Spark should 
allow Iceberg to automatically adjust this
-parameter at write time to match the `write.target-file-size-bytes`.
+settings will also affect other non-writing stages.

Review Comment:
   it also affects other operators, like agg, join, etc., I think emphasizing 
"also affect other non-writing stages." is a more suitable expression here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Docs: Clarify Spark `spark.sql.adaptive.advisoryPartitionSizeInBytes` [iceberg]

Reply via email to