mattcuento commented on PR #1451:
URL:
https://github.com/apache/datafusion-ballista/pull/1451#issuecomment-3912089028
Shuffle Benchmark Configuration:
Rows: 1000000
Input partitions: 4
Output partitions: 16
Batch size: 8192
Iterations: 3
Sort shuffle memory limit: 256 MB
Sort shuffle buffer size: 1 MB
Generating test data...
Generated 124 batches across 4 input partitions
=== Hash-Based Shuffle ===
Warmup iteration completed (not timed)
Iteration 1: 72.468208ms (44 files, 13077 KB)
Iteration 2: 75.524666ms (44 files, 13077 KB)
Iteration 3: 65.193208ms (44 files, 13077 KB)
Hash Shuffle Results:
Average time: 71.062027ms
Min time: 65.193208ms
Max time: 75.524666ms
Files created: 44
Total size: 13077 KB
Throughput: 402.61 MB/s
=== Sort-Based Shuffle ===
Warmup iteration completed (not timed)
Iteration 1: 59.167375ms (8 files, 12107 KB)
Iteration 2: 58.956625ms (8 files, 12107 KB)
Iteration 3: 65.508041ms (8 files, 12107 KB)
Sort Shuffle Results:
Average time: 61.21068ms
Min time: 58.956625ms
Max time: 65.508041ms
Files created: 8
Total size: 12107 KB
Throughput: 467.41 MB/s
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]