sdf-jkl commented on PR #9118: URL: https://github.com/apache/arrow-rs/pull/9118#issuecomment-4699635254
<html><head></head><body><p>I rebased #9118 on top of #9659 and benchmarked three configs against <code>main</code> to see whether the deferral heuristic addresses the row-filter regressions.</p> <p>ClickBench <code>hits_1</code>, interleaved, median of 3 runs. Times in ms, Δ vs <code>main</code> in parens (lower = better).</p> query | main | bitmask-only | bitmask + #9659 | outcome -- | -- | -- | -- | -- async/Q20 | 75.1 | 97.4 (+30%) | 97.6 (+30%) | regressed. Deferral can't help; needs #8846 sync/Q37 | 6.09 | 6.04 (-1%) | 4.85 (-20%) | deferral win async/Q37 | 4.86 | 4.92 (+1%) | 4.15 (-15%) | deferral win sync/Q40 | 5.49 | 5.42 (-1%) | 4.93 (-10%) | deferral win async/Q40 | 6.08 | 6.24 (+3%) | 5.55 (-9%) | deferral win sync/Q41 | 5.34 | 5.22 (-2%) | 4.20 (-21%) | deferral win async/Q41 | 5.11 | 5.20 (+2%) | 4.66 (-9%) | deferral win sync/Q20 | 112.2 | 109.8 (-2%) | 106.4 (-5%) | neutral sync/Q12 | 18.2 | 18.1 (-1%) | 18.0 (-1%) | neutral sync/Q24 | 26.8 | 27.2 (+1%) | 27.3 (+2%) | neutral sync/Q30 | 18.6 | 17.2 (-7%) | 18.1 (-2%) | neutral <p><code>bitmask-only</code> = #9118 alone (deferral off). <code>bitmask + #9659</code> = with the deferral heuristic on.</p> <p><strong>Takeaways:</strong></p> <ul> <li>In the <code>bitmask-only</code> column, <strong>async/Q20 is the only significant regression vs <code>main</code> (+30%)</strong>. Everything else is within noise. (The other historically-reported #9118 regressions don't reproduce on the current branch.)</li> <li>Deferral <strong>doesn't help Q20</strong>: it's a single-predicate query, so there's nothing to defer (deferring the only predicate just re-merges at build time). That's why the two experimental columns are identical there.</li> <li>So Q20 is a <strong>Mask-vs-RowSelection representation problem, not a pushdown-ordering one</strong>. It needs the better filter-representation heuristic in <strong>#8846</strong>, not a selectivity threshold.</li> <li>Separately, deferral gives solid wins on the multi-predicate queries (Q37/Q40/Q41: ~9 to 21% faster than <code>main</code>).</li> </ul> <p>Also fixed the unclosed bold in the async/Q20 row (<code>+30%</code> was missing its closing <code>)</code> and <code>**</code>).</p></body></html> cc @alamb @hhhizzz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
