sdf-jkl commented on PR #9118:
URL: https://github.com/apache/arrow-rs/pull/9118#issuecomment-4699635254

   <html><head></head><body><p>I rebased #9118 on top of #9659 and benchmarked 
three configs against <code>main</code> to see whether the deferral heuristic 
addresses the row-filter regressions.</p>
   <p>ClickBench <code>hits_1</code>, interleaved, median of 3 runs. Times in 
ms, Δ vs <code>main</code> in parens (lower = better).</p>
   
   query | main | bitmask-only | bitmask + #9659 | outcome
   -- | -- | -- | -- | --
   async/Q20 | 75.1 | 97.4 (+30%) | 97.6 (+30%) | regressed. Deferral can't 
help; needs #8846
   sync/Q37 | 6.09 | 6.04 (-1%) | 4.85 (-20%) | deferral win
   async/Q37 | 4.86 | 4.92 (+1%) | 4.15 (-15%) | deferral win
   sync/Q40 | 5.49 | 5.42 (-1%) | 4.93 (-10%) | deferral win
   async/Q40 | 6.08 | 6.24 (+3%) | 5.55 (-9%) | deferral win
   sync/Q41 | 5.34 | 5.22 (-2%) | 4.20 (-21%) | deferral win
   async/Q41 | 5.11 | 5.20 (+2%) | 4.66 (-9%) | deferral win
   sync/Q20 | 112.2 | 109.8 (-2%) | 106.4 (-5%) | neutral
   sync/Q12 | 18.2 | 18.1 (-1%) | 18.0 (-1%) | neutral
   sync/Q24 | 26.8 | 27.2 (+1%) | 27.3 (+2%) | neutral
   sync/Q30 | 18.6 | 17.2 (-7%) | 18.1 (-2%) | neutral
   
   
   <p><code>bitmask-only</code> = #9118 alone (deferral off). <code>bitmask + 
#9659</code> = with the deferral heuristic on.</p>
   <p><strong>Takeaways:</strong></p>
   <ul>
   <li>In the <code>bitmask-only</code> column, <strong>async/Q20 is the only 
significant regression vs <code>main</code> (+30%)</strong>. Everything else is 
within noise. (The other historically-reported #9118 regressions don't 
reproduce on the current branch.)</li>
   <li>Deferral <strong>doesn't help Q20</strong>: it's a single-predicate 
query, so there's nothing to defer (deferring the only predicate just re-merges 
at build time). That's why the two experimental columns are identical 
there.</li>
   <li>So Q20 is a <strong>Mask-vs-RowSelection representation problem, not a 
pushdown-ordering one</strong>. It needs the better filter-representation 
heuristic in <strong>#8846</strong>, not a selectivity threshold.</li>
   <li>Separately, deferral gives solid wins on the multi-predicate queries 
(Q37/Q40/Q41: ~9 to 21% faster than <code>main</code>).</li>
   </ul>
   <p>Also fixed the unclosed bold in the async/Q20 row (<code>+30%</code> was 
missing its closing <code>)</code> and <code>**</code>).</p></body></html>
   
   cc @alamb @hhhizzz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to