iabhi4 opened a new pull request, #46590:
URL: https://github.com/apache/arrow/pull/46590

   ### Rationale for this change
   Closes #46589
   `pyarrow.compute.utf8_is_digit` did not recognize valid Unicode digit 
characters (e.g., superscripts like `'³'`), diverging from the behavior of 
Python's built-in `str.isdigit()`
   This caused inconsistencies in downstream libraries like pandas when using 
PyArrow-backed StringDtype.
   
   ### What changes are included in this PR?
   Updated `IsDigitCharacterUnicode` implementation to cover a broader range of 
Unicode digits by replacing category check with one that aligns with Python’s 
`str.isdigit()` semantics.
   
   Added tests in `scalar_string_test.cc` to validate correct digit detection 
across diverse Unicode digit inputs.
   
   ### Are these changes tested?
   Yes. New unit tests were added and pass successfully, verifying behavior on 
various Unicode digit characters.
   
   ### Are there any user-facing changes?
   Yes, users relying on `pc.utf8_is_digit()` will now get correct results for 
a wider range of Unicode digit characters, improving correctness and parity 
with Python semantics
   
   - GitHub Issue: [#46589](https://github.com/apache/arrow/issues/46589)
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to