ClSlaid commented on PR #9755:
URL: https://github.com/apache/arrow-rs/pull/9755#issuecomment-4272907326

   I reran the relevant `coalesce_kernels` cases on the same machine against 
both the clean baseline worktree and this patch, so the numbers below are 
direct baseline-vs-patch comparisons.
   
   Results:
   
   - `single_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.01`
     - baseline: `3.3982-3.4646 ms`
     - patch: `1.9005-1.9241 ms`
     - result: substantial improvement
   
   - `mixed_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.01`
     - baseline: `2.3677-2.3976 ms`
     - patch: `1.1806-1.1959 ms`
     - result: substantial improvement
   
   - `mixed_binaryview (max_string_len=20), 8192, nulls: 0, selectivity: 0.01`
     - baseline: `2.8777-2.9365 ms`
     - patch: `2.9366-2.9687 ms`
     - result: slight regression
   
   I also checked which path the benchmark is actually taking.
   
   - `mixed_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.01`
     - routes to the fused direct-copy path
   - `single_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.01`
     - routes to the fused direct-copy path
   - `mixed_binaryview (max_string_len=20), 8192, nulls: 0, selectivity: 0.01`
     - stays on the unsupported vanilla path
   
   So the main improvement is in the fully inline `BinaryView` cases that this 
patch targets. The remaining regression on `mixed_binaryview 
(max_string_len=20)` is not coming from the fused fast path, because that 
benchmark still does not use the fused path at all: once non-inline 
`BinaryView` values are present, it remains on the existing 
`filter_record_batch` plus `push_batch` path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to