Dandandan opened a new pull request, #23020: URL: https://github.com/apache/datafusion/pull/23020
## Which issue does this PR close? - N/A. ## Rationale for this change FIRST_VALUE and NTH_VALUE can return the same input row for many consecutive window frames. The evaluator currently rebuilds the same ScalarValue from the same array index for each row, which adds avoidable overhead in tight window-function microbenchmarks. ## What changes are included in this PR? - Cache the most recent ScalarValue by exact input ArrayRef and row index in the nth_value evaluator. - Reuse the cached scalar when subsequent frame evaluations resolve to the same array/index pair. - Add a regression test proving cached scalar values are not reused across different arrays. ## Are these changes tested? - cargo fmt --all --check - cargo clippy --all-targets --all-features -- -D warnings - cargo test -p datafusion-functions-window nth_value --lib - cargo bench -p datafusion-functions-window --bench nth_value -- nth_value_nulls_comparison/first_value/respect_nulls --sample-size 10 --warm-up-time 1 --measurement-time 2 Benchmark median improved from 81.736 us to 47.595 us for nth_value_nulls_comparison/first_value/respect_nulls, about 42% faster. ## Are there any user-facing changes? No API or behavior changes; performance improvement only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
