zanmato1984 commented on PR #45918: URL: https://github.com/apache/arrow/pull/45918#issuecomment-2750739362
Thank you. After some math I think I can explain the perf boost in your setup. Your thread count is `120`, so there are `120` `materialize_` s. You modified the `kNumRowsPerScanTask` to `4k` - let's call each `4k` rows a "batch" for short. You have `512 * (1 << 15)` unmatched rows, aka `4096` batches in the build side. Because `4096` > thread count so the parallelism of the scan will be `120`. Each scan thread accumulates `4096 / 120 = 34` batches into a `materialize_` with most of the rows output to the downstream (the agg) in parallel and `32k` rows pending to flush in `materialize_`. Finally when the scan is finished, the serial flushing will need to flush `120 * 32k` rows aka `960` batches in sequence and output to downstream (the agg), whereas the parallel flushing (this PR) will use `120` thread, each flushing `32k` rows aka `4` batches. Summarizing the comparison: 1. In the scan phase, each thread processes `34` batches, this is the same for both serial and parallel flushing. 2. In the flushing phase, serial execution processes `960` batches but parallel execution processes `4` batches (in each thread). 3. After roughly solving some equations, it takes about `0.5`s to process a batch (mostly by the downstream agg). The serial flushing takes about `(/*scan*/34 + /*flush*/960) * 0.5 = 497`s, and the parallel flushing takes about `(/*scan*/34 + /*flush*/4) * 0.5 = 19`s. 4. The numbers all add up. In terms of the improvement of this PR for `kNumRowsPerScanTask = 512k` (the original value), though we don't have numbers, we can infer that due to the less parallelism (`512 * (1 << 15) / 512k = 32` threads), the scan phase won't be as fast. And although there are still `120` `materialize_` s (the probe parallelism is still `120`), only `32` of them will have data (because of the scan parallelism being `32`), so the flushing will only boost for `32`x using parallel flushing. But anyway, I think the effectiveness of this PR is independent of `kNumRowsPerScanTask` and fully justified. I will now proceed with reviewing the code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
