Dandandan opened a new pull request, #23020:
URL: https://github.com/apache/datafusion/pull/23020

   ## Which issue does this PR close?
   
   - N/A.
   
   ## Rationale for this change
   
   FIRST_VALUE and NTH_VALUE can return the same input row for many consecutive 
window frames. The evaluator currently rebuilds the same ScalarValue from the 
same array index for each row, which adds avoidable overhead in tight 
window-function microbenchmarks.
   
   ## What changes are included in this PR?
   
   - Cache the most recent ScalarValue by exact input ArrayRef and row index in 
the nth_value evaluator.
   - Reuse the cached scalar when subsequent frame evaluations resolve to the 
same array/index pair.
   - Add a regression test proving cached scalar values are not reused across 
different arrays.
   
   ## Are these changes tested?
   
   - cargo fmt --all --check
   - cargo clippy --all-targets --all-features -- -D warnings
   - cargo test -p datafusion-functions-window nth_value --lib
   - cargo bench -p datafusion-functions-window --bench nth_value -- 
nth_value_nulls_comparison/first_value/respect_nulls --sample-size 10 
--warm-up-time 1 --measurement-time 2
   
   Benchmark median improved from 81.736 us to 47.595 us for 
nth_value_nulls_comparison/first_value/respect_nulls, about 42% faster.
   
   ## Are there any user-facing changes?
   
   No API or behavior changes; performance improvement only.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to