randolf-scholz opened a new issue, #48972: URL: https://github.com/apache/arrow/issues/48972
### Describe the enhancement requested I'd like to cast a `string` array to `float`, but it can contain bad entried. ```python import pyarrow as pa import pyarrow.compute as pc arr = pa.array(["1.2", "3", "10-20", None, "nan", ""]) out = pc.cast(arr, pa.float64(), safe=False) # raises ArrowInvalid print(out) # E: [1.2, 3, null, null, nan, null] ``` My current workaround is to export to `pandas` and use [`pandas.to_numeric(errors="coerce")`](https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html#pandas-to-numeric). However, it would be nice if `pyarrow` had some built-in machinery to deal with this situation: 1. A `errors={"raise", "coerce"}` option like `pandas.to_numeric` that catches conversion errors other than overflow and truncation. 2. Add a function that yields a boolean mask of all values that are castable. ```python def is_castable(arr, target_type, options=None) -> Array[bool]: """Returns boolean mask of values that can be cast to target_type, under the chosen options.""" ``` Such a function would also be useful for extracting the set of all values that cannot be cast. ### Component(s) C++, Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
