randolf-scholz opened a new issue, #34976: URL: https://github.com/apache/arrow/issues/34976
### Describe the enhancement requested I have large array consisting of string data. Unfortunately, there is numerical data mixed with categorical data. `pyarrow` seems to offer no straightforward way to separate them. ```python import pyarrow as pa arr = pa.array(["3", "+5", "-4.2", "1,000.00", "foo", "7e-3"], type="string") print(pa.compute.utf8_is_numeric(arr)) # ynnnnn pa.compute.cast(arr, pa.float32()) # ArrowInvalid: Failed to parse string: 'foo' as a scalar of type float ``` basically, it would be great to have either (or both) - function that returns boolean mask whether string can be cast to float - add option to `pyarrow.compute.cast` that replaces errors with null values. My current workaround is to use cast to pandas: `pd.to_numeric(pd.Series(arr, dtype="string[pyarrow]"), errors="coerce")`. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
