pan3793 commented on code in PR #16721: URL: https://github.com/apache/iceberg/pull/16721#discussion_r3371960626
########## docs/docs/spark-writes.md: ########## @@ -458,8 +458,9 @@ or manually repartition the data. To adjust Spark's task size it is important to become familiar with Spark's various Adaptive Query Execution (AQE) parameters. When the `write.distribution-mode` is not `none`, AQE will control the coalescing and splitting of Spark tasks during the exchange to try to create tasks of `spark.sql.adaptive.advisoryPartitionSizeInBytes` size. These -settings will also affect any user performed re-partitions or sorts. -It is important again to note that this is the in-memory Spark row size and not the on disk -columnar-compressed size, so a larger value than the target file size will need to be specified. The ratio of -in-memory size to on disk size is data dependent. Future work in Spark should allow Iceberg to automatically adjust this -parameter at write time to match the `write.target-file-size-bytes`. +settings will also affect other non-writing stages. +It is important again to note that this is the estimated Spark input shuffle data size (typically, is row-based and +compressed with a lower ratio) and not the write file size (typically, is columnar and compressed with a higher Review Comment: AQE decision is made based on `MapStatus` - which records the estimated shuffle block size, for vanilla Spark with default config, it's row-based, lz4 compressed. Users may change it by using columnar shuffle format, e.g., Gluten/Comet, or Remote Shuffle Service, e.g., Celeborn/Uniffle -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
