kosiew opened a new pull request, #23223:
URL: https://github.com/apache/datafusion/pull/23223
## Which issue does this PR close?
* Part of #22688
## Rationale for this change
This change moves the shared string builder infrastructure to fallible
`try_*` APIs so overflow conditions can be propagated as `DataFusionError`
instead of requiring panic-based handling. This provides a consistent error
propagation path for downstream UDF migrations while preserving the existing
infallible APIs as compatibility wrappers where appropriate.
This PR is preparatory work for migrating downstream string UDFs onto
fallible append/write APIs end-to-end. The follow-up work will cover direct row
emitters such as `chr`, `uuid`, `initcap`, and `substr`; helper-driven writers
such as `overlay`, `reverse`, and `translate`; the larger `split_part`
migration, including its index-normalization helpers; and output-amplifying
functions such as `repeat`, `lpad`, and `rpad`, where oversized output must
return `DataFusionError` rather than panic.
## What changes are included in this PR?
* Add shared helpers for validating `StringView` length, offset, and buffer
index values against Arrow's `i32::MAX` limits.
* Introduce fallible `try_*` methods throughout `BulkNullStringArrayBuilder`:
* `try_append_value`
* `try_append_placeholder`
* `try_append_with`
* `try_append_byte_map`
* Keep the existing infallible `append_*` methods as compatibility shims
that delegate to the corresponding `try_*` methods and panic on overflow.
* Convert `GenericStringArrayBuilder::append_with` to reuse the fallible
implementation instead of duplicating logic.
* Refactor `StringViewArrayBuilder` to:
* validate long-view metadata through shared helpers,
* add fallible `try_append_with` and `try_append_byte_map`,
* improve spill-path error handling and rollback so intermediate state is
restored on failure.
* Add shared test-only utilities (`FailingBulkNullStringArrayBuilder` and
`FailingStringWriter`) to support overflow propagation tests in this and
downstream modules.
* Prepare shared string builder APIs for follow-up UDF migrations covering
direct append call sites, helper-driven row writers, `split_part` index
handling, and output-amplifying functions such as `repeat`, `lpad`, and `rpad`.
## Are these changes tested?
Yes.
This PR adds the following tests:
* `bulk_try_append_methods`
* `string_view_builder_try_append_with_and_byte_map_success_path`
* `string_view_builder_rejects_long_view_part_overflow`
* `failing_bulk_builder_propagates_try_append_errors`
It also continues to exercise existing string builder tests.
## Are there any user-facing changes?
No user-facing behavior is intended. This is shared internal infrastructure
that enables downstream code to propagate overflow errors through fallible APIs
while preserving the existing infallible compatibility methods.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]