ngoldbaum commented on code in PR #48391:
URL: https://github.com/apache/arrow/pull/48391#discussion_r2599882083
##########
python/pyarrow/src/arrow/python/numpy_to_arrow.cc:
##########
@@ -338,6 +383,25 @@ Status NumPyConverter::Convert() {
return Status::OK();
}
+ if (IsStringDType(dtype_)) {
+#if NPY_ABI_VERSION >= 0x02000000
+ RETURN_NOT_OK(ConvertStringDType());
+ return Status::OK();
+#else
+ // Fall back to the generic Python sequence conversion path when the
StringDType
+ // C API is unavailable.
+ PyConversionOptions py_options;
+ py_options.type = type_;
+ py_options.from_pandas = from_pandas_;
+ ARROW_ASSIGN_OR_RAISE(
+ auto chunked_array,
+ ConvertPySequence(reinterpret_cast<PyObject*>(arr_),
+ reinterpret_cast<PyObject*>(mask_), py_options,
pool_));
+ out_arrays_ = chunked_array->chunks();
+ return Status::OK();
Review Comment:
This is going to be very slow. Also the only way to get a StringDType array
from NumPy is if at runtime you're using a NumPy newer than 2.0. So the only
way you enter this code is if you're building PyArrow using NumPy 1.x but then
want to use it at runtime with NumPy 2.0. Seems kinda silly? Why not just build
with NumPy 2.0. You don't need to build NumPy with the oldest supported NumPy
anymore.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]