Re: [PR] GH-25118: [Python] Make NumPy an optional runtime dependency [arrow]

via GitHub Tue, 02 Jul 2024 10:00:26 -0700


jorisvandenbossche commented on code in PR #41904:
URL: https://github.com/apache/arrow/pull/41904#discussion_r1654768590



##########
docker-compose.yml:
##########
@@ -1221,6 +1222,50 @@ services:
     volumes: *conda-volumes
     command: *python-conda-command
 
+  conda-python-no-numpy:
+    # Usage:
+    #   docker-compose build conda
+    #   docker-compose build conda-cpp
+    #   docker-compose build conda-python
+    #   docker-compose build conda-python-no-numpy
+    #   docker-compose run --rm conda-python-no-numpy
+    image: ${REPO}:${ARCH}-conda-python-${PYTHON}-no-numpy
+    build:
+      context: .
+      dockerfile: ci/docker/conda-python-pandas.dockerfile
+      cache_from:
+        - ${REPO}:${ARCH}-conda-python-${PYTHON}-pandas-${PANDAS}
+      args:
+        repo: ${REPO}
+        arch: ${ARCH}
+        python: ${PYTHON}
+        numpy: ${NUMPY}
+        pandas: ${PANDAS}
+    shm_size: *shm-size
+    environment:
+      <<: [*common, *ccache, *sccache]
+      PARQUET_REQUIRE_ENCRYPTION:  # inherit
+      # The without numpy mark is only used to select tests to be run
+      # when numpy is not installed but those tests will also run
+      # if numpy is present.

Review Comment:
   > Why wouldn't we exercise reading CSV
   
   I know it was only an example, but FWIW it would be very easy to enable the 
CSV tests in this PR. The `test_csv.py` file does import numpy, but it only 
uses it once to generate random integers. That's something we should be able to 
replace with stdlib `random` quite easily I think.



##########
docker-compose.yml:
##########
@@ -1221,6 +1222,50 @@ services:
     volumes: *conda-volumes
     command: *python-conda-command
 
+  conda-python-no-numpy:
+    # Usage:
+    #   docker-compose build conda
+    #   docker-compose build conda-cpp
+    #   docker-compose build conda-python
+    #   docker-compose build conda-python-no-numpy
+    #   docker-compose run --rm conda-python-no-numpy
+    image: ${REPO}:${ARCH}-conda-python-${PYTHON}-no-numpy
+    build:
+      context: .
+      dockerfile: ci/docker/conda-python-pandas.dockerfile
+      cache_from:
+        - ${REPO}:${ARCH}-conda-python-${PYTHON}-pandas-${PANDAS}
+      args:
+        repo: ${REPO}
+        arch: ${ARCH}
+        python: ${PYTHON}
+        numpy: ${NUMPY}
+        pandas: ${PANDAS}
+    shm_size: *shm-size
+    environment:
+      <<: [*common, *ccache, *sccache]
+      PARQUET_REQUIRE_ENCRYPTION:  # inherit
+      # The without numpy mark is only used to select tests to be run
+      # when numpy is not installed but those tests will also run
+      # if numpy is present.

Review Comment:
   > Why wouldn't we exercise reading CSV
   
   I know it was only an example, but FWIW it would be very easy to enable the 
CSV tests in this PR. The `test_csv.py` file does import numpy, but it only 
uses it once to generate random integers. That's something we should be able to 
replace with stdlib `random` quite easily I think.
   
   (and exactly the same for `test_json.py`)



##########
docker-compose.yml:
##########
@@ -1221,6 +1222,50 @@ services:
     volumes: *conda-volumes
     command: *python-conda-command
 
+  conda-python-no-numpy:
+    # Usage:
+    #   docker-compose build conda
+    #   docker-compose build conda-cpp
+    #   docker-compose build conda-python
+    #   docker-compose build conda-python-no-numpy
+    #   docker-compose run --rm conda-python-no-numpy
+    image: ${REPO}:${ARCH}-conda-python-${PYTHON}-no-numpy
+    build:
+      context: .
+      dockerfile: ci/docker/conda-python-pandas.dockerfile
+      cache_from:
+        - ${REPO}:${ARCH}-conda-python-${PYTHON}-pandas-${PANDAS}
+      args:
+        repo: ${REPO}
+        arch: ${ARCH}
+        python: ${PYTHON}
+        numpy: ${NUMPY}
+        pandas: ${PANDAS}
+    shm_size: *shm-size
+    environment:
+      <<: [*common, *ccache, *sccache]
+      PARQUET_REQUIRE_ENCRYPTION:  # inherit
+      # The without numpy mark is only used to select tests to be run
+      # when numpy is not installed but those tests will also run
+      # if numpy is present.

Review Comment:
   > I understand the concerns but I think the alternative solution that's been 
chosen (run a small subset of the tests) is much worse.
   
   See also Raul's comment at 
https://github.com/apache/arrow/pull/41904#issuecomment-2188800901. I think the 
goal should be to eventually be able to run the full test suite without numpy 
installed and have it automatically run the tests that it can run. 
   However, fixing all our tests to be able to do that in one PR might be a bit 
too much. So my understanding is that the current solution (run a specific 
subset of test files) is only meant as a first step.



##########
docker-compose.yml:
##########
@@ -1221,6 +1222,50 @@ services:
     volumes: *conda-volumes
     command: *python-conda-command
 
+  conda-python-no-numpy:
+    # Usage:
+    #   docker-compose build conda
+    #   docker-compose build conda-cpp
+    #   docker-compose build conda-python
+    #   docker-compose build conda-python-no-numpy
+    #   docker-compose run --rm conda-python-no-numpy
+    image: ${REPO}:${ARCH}-conda-python-${PYTHON}-no-numpy
+    build:
+      context: .
+      dockerfile: ci/docker/conda-python-pandas.dockerfile
+      cache_from:
+        - ${REPO}:${ARCH}-conda-python-${PYTHON}-pandas-${PANDAS}
+      args:
+        repo: ${REPO}
+        arch: ${ARCH}
+        python: ${PYTHON}
+        numpy: ${NUMPY}
+        pandas: ${PANDAS}
+    shm_size: *shm-size
+    environment:
+      <<: [*common, *ccache, *sccache]
+      PARQUET_REQUIRE_ENCRYPTION:  # inherit
+      # The without numpy mark is only used to select tests to be run
+      # when numpy is not installed but those tests will also run
+      # if numpy is present.

Review Comment:
   > I understand the concerns but I think the alternative solution that's been 
chosen (run a small subset of the tests) is much worse.
   
   See also Raul's comment at 
https://github.com/apache/arrow/pull/41904#issuecomment-2188800901. I think the 
goal should be to eventually be able to run the full test suite without numpy 
installed and have it automatically run the tests that it can run. 
   However, fixing all our tests to be able to do that in one PR might be a bit 
too much. So my understanding is that the current solution (run a specific 
subset of test files) is only meant as a first step.
   
   (and I am not sure what would be a better alternative, although we could of 
course do precursor PRs to prepare the test suite to reduce the dependency on 
numpy, so this PR gets smaller. But personally I am fine with the current order 
as well)



##########
python/pyarrow/tests/test_without_numpy.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pytest
+
+import pyarrow as pa
+
+# Marks all of the tests in this module
+# Ignore these with pytest ... -m 'not nonumpy'
+pytestmark = pytest.mark.nonumpy

Review Comment:
   Maybe `does_not_require_numpy` instead of `without_numpy` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-25118: [Python] Make NumPy an optional runtime dependency [arrow]

Reply via email to