ClSlaid commented on PR #9755:
URL: https://github.com/apache/arrow-rs/pull/9755#issuecomment-4605929240

   Follow-up on the apparent regressions from the short full-matrix smoke run.
   
   I reran the suspicious cases with a focused Criterion run (`--sample-size 
100 --warm-up-time 3 --measurement-time 5`). The large `mixed_utf8view 
max_len=128, nulls=0, sel=0.8` case was not a real regression in the focused 
run: baseline 1.2943 ms, this PR 1.3044 ms, about 1.01x. The `take: mixed_utf8, 
nulls=0.1, sel=0.001` control case was also basically unchanged: 21.120 ms vs 
21.154 ms.
   
   The real issue was the fused path being enabled too broadly. For 
`single_*view max_len=8, nulls=0.1, sel=0.1`, the previous 25% threshold made 
the fused path run, and the focused run showed real regressions around 18-25%.
   
   I narrowed the fused path to very sparse filters only (`selected_count * 16 
<= filter.len()`, roughly <=6.25% selected). After that change, the focused 
results are:
   
   | case | baseline | this PR | ratio |
   |---|---:|---:|---:|
   | `single_utf8view max_len=8, nulls=0, sel=0.01` | 2.1939 ms | 1.6408 ms | 
0.75x |
   | `single_binaryview max_len=8, nulls=0, sel=0.01` | 2.1397 ms | 1.6100 ms | 
0.75x |
   | `mixed_utf8view max_len=8, nulls=0, sel=0.01` | 1.7155 ms | 858.26 us | 
0.50x |
   | `mixed_binaryview max_len=8, nulls=0, sel=0.01` | 1.7194 ms | 855.99 us | 
0.50x |
   | `mixed_binaryview max_len=20, nulls=0, sel=0.01` | 2.2631 ms | 2.3320 ms | 
1.03x |
   | `mixed_utf8view max_len=128, nulls=0, sel=0.8` | 1.2907 ms | 1.3077 ms | 
1.01x |
   | `single_utf8view max_len=8, nulls=0.1, sel=0.1` | 1.0420 ms | 1.0199 ms | 
0.98x |
   | `single_binaryview max_len=8, nulls=0.1, sel=0.1` | 1.0367 ms | 1.0313 ms 
| 1.00x |
   
   So the low-selectivity inline View wins remain, and the mid-selectivity/null 
regression is avoided by falling back to the generic filter kernel sooner. 
Pushed this as the new single-commit head: `5e0bb918b`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to