Brijesh-Thakkar opened a new pull request, #19598:
URL: https://github.com/apache/datafusion/pull/19598
## Which issue does this PR close?
This PR contributes to the performance epic tracking expressions that are
slower with Comet enabled:
- Related to apache/datafusion-comet#2986
## Rationale for this change
Benchmarks in the Comet performance epic show that the `bit_length` string
expression is slower when executed via Comet. The existing implementation
relies on the generic Arrow `bit_length` kernel, which introduces additional
overhead for common string array types.
This change specializes the implementation for `StringArray` and
`LargeStringArray` to reduce per-row overhead while preserving existing
behavior for all other array types.
## What changes are included in this PR?
- Added a specialized implementation of `bit_length` for `StringArray` and
`LargeStringArray`
- Avoided the generic Arrow length kernel for these array types
- Retained the existing Arrow kernel as a fallback for other array types
(e.g. Utf8View, Dictionary, Binary)
## Are these changes tested?
Yes. Existing unit tests and SQL logic tests already cover `bit_length`
behavior across string types, including UTF-8, LargeUtf8, Utf8View, and
dictionary-encoded strings. All existing tests pass without modification.
## Are there any user-facing changes?
No. This change is an internal performance optimization and does not affect
user-facing behavior or semantics.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]