ClSlaid commented on PR #9755: URL: https://github.com/apache/arrow-rs/pull/9755#issuecomment-4605929240
Follow-up on the apparent regressions from the short full-matrix smoke run. I reran the suspicious cases with a focused Criterion run (`--sample-size 100 --warm-up-time 3 --measurement-time 5`). The large `mixed_utf8view max_len=128, nulls=0, sel=0.8` case was not a real regression in the focused run: baseline 1.2943 ms, this PR 1.3044 ms, about 1.01x. The `take: mixed_utf8, nulls=0.1, sel=0.001` control case was also basically unchanged: 21.120 ms vs 21.154 ms. The real issue was the fused path being enabled too broadly. For `single_*view max_len=8, nulls=0.1, sel=0.1`, the previous 25% threshold made the fused path run, and the focused run showed real regressions around 18-25%. I narrowed the fused path to very sparse filters only (`selected_count * 16 <= filter.len()`, roughly <=6.25% selected). After that change, the focused results are: | case | baseline | this PR | ratio | |---|---:|---:|---:| | `single_utf8view max_len=8, nulls=0, sel=0.01` | 2.1939 ms | 1.6408 ms | 0.75x | | `single_binaryview max_len=8, nulls=0, sel=0.01` | 2.1397 ms | 1.6100 ms | 0.75x | | `mixed_utf8view max_len=8, nulls=0, sel=0.01` | 1.7155 ms | 858.26 us | 0.50x | | `mixed_binaryview max_len=8, nulls=0, sel=0.01` | 1.7194 ms | 855.99 us | 0.50x | | `mixed_binaryview max_len=20, nulls=0, sel=0.01` | 2.2631 ms | 2.3320 ms | 1.03x | | `mixed_utf8view max_len=128, nulls=0, sel=0.8` | 1.2907 ms | 1.3077 ms | 1.01x | | `single_utf8view max_len=8, nulls=0.1, sel=0.1` | 1.0420 ms | 1.0199 ms | 0.98x | | `single_binaryview max_len=8, nulls=0.1, sel=0.1` | 1.0367 ms | 1.0313 ms | 1.00x | So the low-selectivity inline View wins remain, and the mid-selectivity/null regression is avoided by falling back to the generic filter kernel sooner. Pushed this as the new single-commit head: `5e0bb918b`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
