pitrou commented on code in PR #41904:
URL: https://github.com/apache/arrow/pull/41904#discussion_r1629699221
##########
python/pyarrow/src/arrow/python/init.cc:
##########
@@ -21,4 +21,15 @@
#include "arrow/python/init.h"
#include "arrow/python/numpy_interop.h"
-int arrow_init_numpy() { return arrow::py::import_numpy(); }
+bool numpy_imported = false;
+
+int arrow_init_numpy(bool import_numpy) {
Review Comment:
Does it make sense to add this argument? The function does nothing if it is
false, so it could just be skipped from the Python side.
Or perhaps we want to make sure that this function is always called before
PyArrow is used?
##########
.github/workflows/python.yml:
##########
@@ -84,6 +85,11 @@ jobs:
title: AMD64 Conda Python 3.10 Pandas latest
python: "3.10"
pandas: latest
+ - name: conda-python-3.10-no-numpy
+ cache: conda-python-3.10
+ image: conda-python-no-numpy
+ title: AMD64 Conda Python 3.10 Testing without numpy
Review Comment:
```suggestion
title: AMD64 Conda Python 3.10 without NumPy
```
##########
python/pyarrow/tests/conftest.py:
##########
@@ -25,8 +25,14 @@
import pytest
import hypothesis as h
+try:
+ import numpy as np
Review Comment:
Why is it needed to import NumPy here?
##########
python/pyarrow/tests/conftest.py:
##########
@@ -116,6 +122,30 @@ def pytest_runtest_setup(item):
item.config.pyarrow.apply_mark(mark)
[email protected]
+def all_array_types():
Review Comment:
Is this needed here?
##########
python/pyarrow/types.pxi:
##########
@@ -149,14 +154,14 @@ def _is_primitive(Type type):
def _get_pandas_type(arrow_type, coerce_to_ns=False):
cdef Type type_id = arrow_type.id
- if type_id not in _pandas_type_map:
+ if type_id not in _get_pandas_type_map():
Review Comment:
Will this recreate the dict everytime this function is called? We don't want
this.
##########
ci/scripts/python_test.sh:
##########
@@ -69,4 +69,8 @@ export PYARROW_TEST_PARQUET_ENCRYPTION
export PYARROW_TEST_S3
# Testing PyArrow
-pytest -r s ${PYTEST_ARGS} --pyargs pyarrow
+if [ -z "${TEST_COMMAND}" ]; then
Review Comment:
Is this needed? If so, perhaps use a more specific variable name, such as
`PYARROW_TEST_COMMAND`.
##########
python/pyarrow/src/arrow/python/init.h:
##########
@@ -22,5 +22,6 @@
extern "C" {
ARROW_PYTHON_EXPORT
-int arrow_init_numpy();
+int arrow_init_numpy(bool import_numpy);
+bool get_numpy_imported();
}
Review Comment:
I think the `extern "C"` is unwarranted, we can just use regular C++
namespacing:
```c++
namespace arrow::py {
ARROW_PYTHON_EXPORT
int init_numpy(bool has_numpy);
bool has_numpy();
} // namespace arrow::py
```
(you'll need to update the corresponding Cython declarations)
##########
python/pyarrow/tests/strategies.py:
##########
@@ -35,7 +38,10 @@
import tzdata # noqa:F401
except ImportError:
zoneinfo = None
-import numpy as np
+try:
+ import numpy as np
+except ImportError:
+ pass
Review Comment:
Is the import actually used below?
##########
python/pyarrow/lib.pyx:
##########
@@ -32,8 +35,17 @@ from pyarrow.includes.common cimport PyObject_to_object
cimport pyarrow.includes.libarrow_python as libarrow_python
cimport cpython as cp
-# Initialize NumPy C API
-arrow_init_numpy()
+# Initialize NumPy C API only if numpy was able to be imported
+
+
+def initialize_numpy():
+ if "numpy" in sys.modules:
+ arrow_init_numpy(True)
+ else:
+ arrow_init_numpy(False)
+
+
+initialize_numpy()
Review Comment:
Could be as simple as:
```python
if np is not None:
arrow_init_numpy()
```
##########
python/pyarrow/builder.pxi:
##########
@@ -42,7 +42,7 @@ cdef class StringBuilder(_Weakrefable):
value : string/bytes or np.nan/None
The value to append to the string array builder.
"""
- if value is None or value is np.nan:
+ if value is None or ('numpy' in sys.modules and value is np.nan):
Review Comment:
Perhaps this
```suggestion
if value is None or math.isnan(value):
```
##########
python/pyarrow/lib.pyx:
##########
@@ -21,7 +21,10 @@
import datetime
import decimal as _pydecimal
-import numpy as np
+try:
+ import numpy as np
+except ImportError:
+ pass
Review Comment:
If you change this thusly,
```suggestion
np = None
```
Then you should later be able to write `np is not None` to check that NumPy
is present.
##########
python/pyarrow/tests/test_compute.py:
##########
@@ -1111,30 +1093,31 @@ def test_binary_join_element_wise():
'a', 'b', null, options=replace).as_py() is None
[email protected](('ty', 'values'), all_array_types)
Review Comment:
I've no strong opinion on whether we want to keep these tests parameterized
or not. @jorisvandenbossche What do you think?
##########
python/pyarrow/src/arrow/python/init.cc:
##########
@@ -21,4 +21,15 @@
#include "arrow/python/init.h"
Review Comment:
Side note: can we rename these files to `numpy_init.cc` and `numpy_init.h`?
They're really NumPy specific.
##########
python/pyarrow/tests/util.py:
##########
@@ -22,7 +22,10 @@
import contextlib
import decimal
import gc
-import numpy as np
+try:
+ import numpy as np
+except ImportError:
+ pass
Review Comment:
Is this import actually used?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]