HuaHuaY commented on PR #50281: URL: https://github.com/apache/arrow/pull/50281#issuecomment-4858175714
I've only modified `BitRunReader` and not modified `SetBitRunReader`. And I hope to narrow the performance gap between `BitRunReader` and `SetBitRunReader`. Therefore, I think the change in the performance ratio of `VisitBitRunsSum` and `VisitSetBitRunsSum` is more indicative of the effectiveness of the modifications. I've reorganized the data and listed the ratio differences in the last column. ## Group 1 A: 4ab7d672 / amd64-c6a-4xlarge-linux / c0d6bf7 B: d164bbbe / amd64-c6a-4xlarge-linux / 3b5b2d0 Ratio: `VisitBitRunsSum` / `VisitSetBitRunsSum` | params | A `VisitBitRunsSum` | A `VisitSetBitRunsSum` | A ratio | B `VisitBitRunsSum` | B `VisitSetBitRunsSum` | B ratio | Δ ratio (B-A) | |---:|---:|---:|---:|---:|---:|---:|---:| | 2 | 3.10e+8 | 3.59e+8 | 0.8643 | 3.13e+8 | 3.39e+8 | 0.9232 | +0.0589 | | 8 | 5.70e+8 | 6.40e+8 | 0.8903 | 5.75e+8 | 6.07e+8 | 0.9471 | +0.0568 | | 64 | 2.07e+9 | 2.20e+9 | 0.9415 | 2.10e+9 | 2.17e+9 | 0.9672 | +0.0257 | | 512 | 4.30e+9 | 4.58e+9 | 0.9384 | 4.38e+9 | 4.50e+9 | 0.9729 | +0.0345 | | 4096 | 4.68e+9 | 4.88e+9 | 0.9582 | 4.70e+9 | 4.80e+9 | 0.9797 | +0.0214 | | 32768 | 4.73e+9 | 4.93e+9 | 0.9579 | 4.76e+9 | 4.86e+9 | 0.9809 | +0.0230 | | 65536 | 4.73e+9 | 4.94e+9 | 0.9584 | 4.76e+9 | 4.87e+9 | 0.9768 | +0.0184 | ## Group 2 A: fd5a984c / amd64-m5-4xlarge-linux / c0d6bf7 B: e74c6927 / amd64-m5-4xlarge-linux / 3b5b2d0 Ratio: `VisitBitRunsSum` / `VisitSetBitRunsSum` | params | A `VisitBitRunsSum` | A `VisitSetBitRunsSum` | A ratio | B `VisitBitRunsSum` | B `VisitSetBitRunsSum` | B ratio | Δ ratio (B-A) | |---:|---:|---:|---:|---:|---:|---:|---:| | 2 | 2.07e+8 | 2.34e+8 | 0.8871 | 2.18e+8 | 2.30e+8 | 0.9494 | +0.0624 | | 8 | 3.73e+8 | 4.01e+8 | 0.9297 | 3.84e+8 | 3.99e+8 | 0.9608 | +0.0311 | | 64 | 1.15e+9 | 1.15e+9 | 1.0013 | 1.15e+9 | 1.15e+9 | 1.0049 | +0.0035 | | 512 | 1.98e+9 | 2.03e+9 | 0.9734 | 1.97e+9 | 2.03e+9 | 0.9715 | -0.0019 | | 4096 | 2.17e+9 | 2.24e+9 | 0.9706 | 2.15e+9 | 2.24e+9 | 0.9604 | -0.0102 | | 32768 | 2.19e+9 | 2.26e+9 | 0.9693 | 2.17e+9 | 2.26e+9 | 0.9594 | -0.0099 | | 65536 | 2.19e+9 | 2.26e+9 | 0.9694 | 2.17e+9 | 2.26e+9 | 0.9592 | -0.0102 | ## Group 3 A: 457fe991 / test-mac-arm / c0d6bf7 B: 6701543e / test-mac-arm / 3b5b2d0 Ratio: `VisitBitRunsSum` / `VisitSetBitRunsSum` | params | A `VisitBitRunsSum` | A `VisitSetBitRunsSum` | A ratio | B `VisitBitRunsSum` | B `VisitSetBitRunsSum` | B ratio | Δ ratio (B-A) | |---:|---:|---:|---:|---:|---:|---:|---:| | 2 | 3.41e+8 | 4.00e+8 | 0.8526 | 3.58e+8 | 4.07e+8 | 0.8812 | +0.0286 | | 8 | 6.44e+8 | 6.36e+8 | 1.0130 | 5.84e+8 | 6.49e+8 | 0.8995 | -0.1135 | | 64 | 1.57e+9 | 2.64e+9 | 0.5946 | 2.27e+9 | 2.59e+9 | 0.8757 | +0.2810 | | 512 | 2.33e+9 | 9.07e+9 | 0.2571 | 8.53e+9 | 9.00e+9 | 0.9473 | +0.6902 | | 4096 | 2.57e+9 | 1.22e+10 | 0.2110 | 1.23e+10 | 1.22e+10 | 1.0099 | +0.7988 | | 32768 | 2.59e+9 | 1.29e+10 | 0.2014 | 1.31e+10 | 1.28e+10 | 1.0180 | +0.8165 | | 65536 | 2.59e+9 | 1.29e+10 | 0.2012 | 1.31e+10 | 1.29e+10 | 1.0176 | +0.8164 | However, even if the bitmap benchmark results improve, it doesn't necessarily mean there will be improvements in other benchmark cases. I looked at the three test cases in the benchmark report above. `ArrayArrayKernel` seems unrelated to the path I modified. I couldn't reproduce the remaining two tests in my own x86 environment. I rerun the benchmark, and if it still shows a performance regression, I plan to roll back the `BitRunReader` changes. Optimizing `BitRunReader` wasn't the original purpose of this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
