iabhi4 opened a new pull request, #46590: URL: https://github.com/apache/arrow/pull/46590
### Rationale for this change Closes #46589 `pyarrow.compute.utf8_is_digit` did not recognize valid Unicode digit characters (e.g., superscripts like `'³'`), diverging from the behavior of Python's built-in `str.isdigit()` This caused inconsistencies in downstream libraries like pandas when using PyArrow-backed StringDtype. ### What changes are included in this PR? Updated `IsDigitCharacterUnicode` implementation to cover a broader range of Unicode digits by replacing category check with one that aligns with Python’s `str.isdigit()` semantics. Added tests in `scalar_string_test.cc` to validate correct digit detection across diverse Unicode digit inputs. ### Are these changes tested? Yes. New unit tests were added and pass successfully, verifying behavior on various Unicode digit characters. ### Are there any user-facing changes? Yes, users relying on `pc.utf8_is_digit()` will now get correct results for a wider range of Unicode digit characters, improving correctness and parity with Python semantics - GitHub Issue: [#46589](https://github.com/apache/arrow/issues/46589) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org