Kontinuation commented on issue #887: URL: https://github.com/apache/datafusion-comet/issues/887#issuecomment-2317124398
I found that the memory reserved by the native shuffle writer is based on [guesses of data sizes](https://github.com/apache/datafusion-comet/blob/0.2.0/native/core/src/execution/datafusion/shuffle_writer.rs#L226-L236) and [proportional to the number of partitions](https://github.com/apache/datafusion-comet/blob/0.2.0/native/core/src/execution/datafusion/shuffle_writer.rs#L746-L759). For TPC-H query 10, the schema of shuffled record batches is: | Field | Type | Slot Size | |--|--|--| |col0| Int64 | 8 * len | | col1 | Utf8 | 104 * len | | col2 | Decimal128(12, 2) | 16 * len | | col3 | Utf8 | 104 * len | | col4 | Utf8 | 104 * len | | col5 | Utf8 | 104 * len | | col6 | Utf8 | 104 * len | | col7 | Decimal128(36, 4) | 16 * len | | col8 | Boolean | len / 8 | The estimated size of each record batch is 4588544 bytes given the batch size of 8192. The shuffle repartitioner reserves a batch for each partition, given the partition number of 200 (the default value of `spark.sql.shuffle.partitions`), the total amount of reserved memory is 917708800 bytes (nearly 900 MB). We have multiple cores on each executor, each core runs a shuffle writer and reserves its own memory. On a 6-core worker instance, the amount of reserved memory will be 5 GB. If we tune the number of shuffle partitions for large ETL, the native shuffle writer has to reserve more memory. For partition number = 2000, the total reserved memory will be 50 GB. This is usually more than the total memory of the worker instance. Maybe a better approach is to always reserve `comet execution memory * shuffle fraction` amount of memory, and check if the reserved memory was exceeded each time we insert a record batch? We can estimate the amount of allocated memory using `batch.get_array_memory_size()`, and trigger a spill if we've already ingested too many batches. This would make the native shuffle writer adaptive to the actual size of record batches and avoid over-reservation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
