kosiew opened a new pull request, #23074:
URL: https://github.com/apache/datafusion/pull/23074
## Which issue does this PR close?
* Part of #22688
## Rationale for this change
This PR continues the migration to fallible string builder APIs for string
UDFs that previously relied on infallible `append_value` and
`append_placeholder` calls.
The non-ASCII case conversion paths in `string/common.rs` and the generic
implementations in `unicode/substrindex.rs` could panic when string view
metadata exceeds ByteView's `i32::MAX` limits. Converting these call sites to
use fallible builder APIs allows overflow conditions to be propagated as
`DataFusionError`s instead of causing panics.
This work is part of the broader effort in #22688 to eliminate panic-based
overflow handling in string builders.
## What changes are included in this PR?
* Migrated non-ASCII case conversion paths in `string/common.rs` to use:
* `try_append_value`
* `try_append_placeholder`
* Migrated `substr_index_general` and `map_strings` in
`unicode/substrindex.rs` to use:
* `try_append_value`
* `try_append_placeholder`
* Added fallible APIs to `StringViewArrayBuilder`:
* `try_append_value`
* `try_append_placeholder`
* `try_ensure_long_capacity`
* Added overflow error generation for StringView-specific limits:
* `string_view_overflow_error`
* Refactored StringView view construction to validate:
* value length
* buffer offsets
* completed buffer count
* Preserved existing infallible APIs (`append_value`, `append_placeholder`,
`ensure_long_capacity`) as wrappers around the fallible implementations,
maintaining existing behavior for callers that still use them.
## Are these changes tested?
Yes.
Added the following unit tests:
* `string_view_builder_try_append_success_path`
* `test_substr_index_all_nulls`
These tests verify:
* Successful operation of the new fallible StringView builder APIs.
* Correct null propagation behavior in `substr_index_general`.
Existing case conversion and `substr_index` tests continue to validate the
normal execution paths. No dedicated overflow-triggering tests were added in
this PR.
## Are there any user-facing changes?
No user-facing functionality changes are expected.
The primary behavioral change is that certain internal string-builder
overflow conditions can now be returned as `DataFusionError`s rather than
triggering panics, improving robustness in extreme cases.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]