Re: [PR] perf: optimize arrays_zip perfect list zips [datafusion]

via GitHub Mon, 25 May 2026 15:44:02 -0700


puneetdixit200 commented on PR #22285:
URL: https://github.com/apache/datafusion/pull/22285#issuecomment-4537748133


   I re-ran the same microbenchmark against both PR heads on the same machine.
   
   Command:
   
   ```bash
   CARGO_TARGET_DIR=/tmp/datafusion-pr-target cargo bench -p 
datafusion-functions-nested --bench arrays_zip -- --warm-up-time 1 
--measurement-time 2 --sample-size 10
   ```
   
   Environment: macOS arm64, `rustc 1.95.0`.
   
   | branch | head | perfect/no-null case | 10% null case |
   | --- | --- | ---: | ---: |
   | this PR | `035689147f61` | `5.0013 us 5.0086 us 5.0147 us` 
(`arrays_zip_perfect_zip_8192`) | `1.0152 ms 1.0324 ms 1.0416 ms` |
   | #22245 | `9e63ba262601` | `20.870 us 20.928 us 20.987 us` 
(`arrays_zip_no_nulls_8192`) | `994.76 us 1.0028 ms 1.0148 ms` |
   
   So this branch is about 4.2x faster for the perfect-list fast path on this 
run. The 10% null case is effectively in the same range, which is expected 
because this branch falls back to the general path there.
   
   The implementation difference I see is that #22245 currently reuses the 
first concrete input null bitmap for the output. This branch builds the output 
null bitmap using the existing semantics: a row is null only when all concrete 
inputs are null, and it rejects null rows with hidden values before taking the 
fast path. That should avoid the mixed null/empty-list issue called out on 
#22245.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: optimize arrays_zip perfect list zips [datafusion]

Reply via email to