UtkarshSahay123 opened a new pull request, #9015:
URL: https://github.com/apache/arrow-rs/pull/9015

   ## What does this PR do?
   
   This PR fixes UTF-8 boundary validation in substring kernels for sliced
   `Utf8` and `LargeUtf8` arrays.
   
   Previously, UTF-8 boundary checks were performed against the full underlying
   buffer, which could lead to incorrect validation when arrays were sliced.
   This change ensures boundaries are validated relative to each value.
   
   ## Why is this change needed?
   
   Substring kernels operate on value-relative offsets. Validating offsets
   against the global buffer can incorrectly reject valid boundaries or accept
   invalid ones when arrays are sliced. This fix aligns validation with
   value-level semantics.
   
   ## What changes were made?
   
   - Perform UTF-8 boundary validation relative to per-value slices
   - Preserve existing behavior for unsliced arrays
   - No API changes
   
   ## Tests
   
   - Existing substring tests cover this behavior
   - No new tests were required
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to