UtkarshSahay123 opened a new pull request, #9015: URL: https://github.com/apache/arrow-rs/pull/9015
## What does this PR do? This PR fixes UTF-8 boundary validation in substring kernels for sliced `Utf8` and `LargeUtf8` arrays. Previously, UTF-8 boundary checks were performed against the full underlying buffer, which could lead to incorrect validation when arrays were sliced. This change ensures boundaries are validated relative to each value. ## Why is this change needed? Substring kernels operate on value-relative offsets. Validating offsets against the global buffer can incorrectly reject valid boundaries or accept invalid ones when arrays are sliced. This fix aligns validation with value-level semantics. ## What changes were made? - Perform UTF-8 boundary validation relative to per-value slices - Preserve existing behavior for unsliced arrays - No API changes ## Tests - Existing substring tests cover this behavior - No new tests were required -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
