kosiew opened a new pull request, #23074:
URL: https://github.com/apache/datafusion/pull/23074

   ## Which issue does this PR close?
   
   * Part of #22688
   
   ## Rationale for this change
   
   This PR continues the migration to fallible string builder APIs for string 
UDFs that previously relied on infallible `append_value` and 
`append_placeholder` calls.
   
   The non-ASCII case conversion paths in `string/common.rs` and the generic 
implementations in `unicode/substrindex.rs` could panic when string view 
metadata exceeds ByteView's `i32::MAX` limits. Converting these call sites to 
use fallible builder APIs allows overflow conditions to be propagated as 
`DataFusionError`s instead of causing panics.
   
   This work is part of the broader effort in #22688 to eliminate panic-based 
overflow handling in string builders. 
   
   ## What changes are included in this PR?
   
   * Migrated non-ASCII case conversion paths in `string/common.rs` to use:
   
     * `try_append_value`
     * `try_append_placeholder`
   * Migrated `substr_index_general` and `map_strings` in 
`unicode/substrindex.rs` to use:
   
     * `try_append_value`
     * `try_append_placeholder`
   * Added fallible APIs to `StringViewArrayBuilder`:
   
     * `try_append_value`
     * `try_append_placeholder`
     * `try_ensure_long_capacity`
   * Added overflow error generation for StringView-specific limits:
   
     * `string_view_overflow_error`
   * Refactored StringView view construction to validate:
   
     * value length
     * buffer offsets
     * completed buffer count
   * Preserved existing infallible APIs (`append_value`, `append_placeholder`, 
`ensure_long_capacity`) as wrappers around the fallible implementations, 
maintaining existing behavior for callers that still use them.
   
   ## Are these changes tested?
   
   Yes.
   
   Added the following unit tests:
   
   * `string_view_builder_try_append_success_path`
   * `test_substr_index_all_nulls`
   
   These tests verify:
   
   * Successful operation of the new fallible StringView builder APIs.
   * Correct null propagation behavior in `substr_index_general`.
   
   Existing case conversion and `substr_index` tests continue to validate the 
normal execution paths. No dedicated overflow-triggering tests were added in 
this PR. 
   
   ## Are there any user-facing changes?
   
   No user-facing functionality changes are expected.
   
   The primary behavioral change is that certain internal string-builder 
overflow conditions can now be returned as `DataFusionError`s rather than 
triggering panics, improving robustness in extreme cases. 
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to