pan3793 commented on code in PR #16721: URL: https://github.com/apache/iceberg/pull/16721#discussion_r3371936555
########## docs/docs/spark-writes.md: ########## @@ -458,8 +458,9 @@ or manually repartition the data. To adjust Spark's task size it is important to become familiar with Spark's various Adaptive Query Execution (AQE) parameters. When the `write.distribution-mode` is not `none`, AQE will control the coalescing and splitting of Spark tasks during the exchange to try to create tasks of `spark.sql.adaptive.advisoryPartitionSizeInBytes` size. These -settings will also affect any user performed re-partitions or sorts. -It is important again to note that this is the in-memory Spark row size and not the on disk -columnar-compressed size, so a larger value than the target file size will need to be specified. The ratio of -in-memory size to on disk size is data dependent. Future work in Spark should allow Iceberg to automatically adjust this -parameter at write time to match the `write.target-file-size-bytes`. +settings will also affect other non-writing stages. Review Comment: it also affects other operators, like agg, join, etc., I think emphasizing "also affect other non-writing stages." is a more suitable expression here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
