kosiew opened a new pull request, #23223:
URL: https://github.com/apache/datafusion/pull/23223

   ## Which issue does this PR close?
   
   * Part of #22688
   
   ## Rationale for this change
   
   This change moves the shared string builder infrastructure to fallible 
`try_*` APIs so overflow conditions can be propagated as `DataFusionError` 
instead of requiring panic-based handling. This provides a consistent error 
propagation path for downstream UDF migrations while preserving the existing 
infallible APIs as compatibility wrappers where appropriate.
   
   This PR is preparatory work for migrating downstream string UDFs onto 
fallible append/write APIs end-to-end. The follow-up work will cover direct row 
emitters such as `chr`, `uuid`, `initcap`, and `substr`; helper-driven writers 
such as `overlay`, `reverse`, and `translate`; the larger `split_part` 
migration, including its index-normalization helpers; and output-amplifying 
functions such as `repeat`, `lpad`, and `rpad`, where oversized output must 
return `DataFusionError` rather than panic.
   
   ## What changes are included in this PR?
   
   * Add shared helpers for validating `StringView` length, offset, and buffer 
index values against Arrow's `i32::MAX` limits.
   * Introduce fallible `try_*` methods throughout `BulkNullStringArrayBuilder`:
   
     * `try_append_value`
     * `try_append_placeholder`
     * `try_append_with`
     * `try_append_byte_map`
   * Keep the existing infallible `append_*` methods as compatibility shims 
that delegate to the corresponding `try_*` methods and panic on overflow.
   * Convert `GenericStringArrayBuilder::append_with` to reuse the fallible 
implementation instead of duplicating logic.
   * Refactor `StringViewArrayBuilder` to:
   
     * validate long-view metadata through shared helpers,
     * add fallible `try_append_with` and `try_append_byte_map`,
     * improve spill-path error handling and rollback so intermediate state is 
restored on failure.
   * Add shared test-only utilities (`FailingBulkNullStringArrayBuilder` and 
`FailingStringWriter`) to support overflow propagation tests in this and 
downstream modules.
   * Prepare shared string builder APIs for follow-up UDF migrations covering 
direct append call sites, helper-driven row writers, `split_part` index 
handling, and output-amplifying functions such as `repeat`, `lpad`, and `rpad`.
   
   ## Are these changes tested?
   
   Yes.
   
   This PR adds the following tests:
   
   * `bulk_try_append_methods`
   * `string_view_builder_try_append_with_and_byte_map_success_path`
   * `string_view_builder_rejects_long_view_part_overflow`
   * `failing_bulk_builder_propagates_try_append_errors`
   
   It also continues to exercise existing string builder tests.
   
   ## Are there any user-facing changes?
   
   No user-facing behavior is intended. This is shared internal infrastructure 
that enables downstream code to propagate overflow errors through fallible APIs 
while preserving the existing infallible compatibility methods.
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to