geoffreyclaude opened a new pull request, #19376: URL: https://github.com/apache/datafusion/pull/19376
## Which issue does this PR close? - Related to #19241 ## Rationale for this change This PR enhances the `in_list` benchmark suite to provide more comprehensive performance measurements across a wider range of data types and list sizes. These improvements are necessary groundwork for evaluating optimizations proposed in #19241. The current benchmarks were limited in scope, making it difficult to assess the performance impact of potential `in_list` optimizations across different data types and scenarios. ## What changes are included in this PR? - Added benchmarks for `UInt8Array`, `Int16Array`, and `TimestampNanosecondArray` - Added `28` to `IN_LIST_LENGTHS` (now `[3, 8, 28, 100]`) to better cover the range between small and large lists - Increased `ARRAY_LENGTH` from `1024` to `8192` to be aligned with the default DataFusionbatch size - Configured criterion with shorter warm-up (100ms) and measurement times (500ms) for faster iteration ## Are these changes tested? Yes, this PR adds benchmark coverage. The benchmarks can be run with: ```bash cargo bench --bench in_list ``` The benchmarks verify that the `in_list` expression evaluates correctly for all the new data types. ## Are there any user-facing changes? No user-facing changes. This PR only affects the benchmark suite used for performance testing and development. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
