jorisvandenbossche commented on code in PR #41904:
URL: https://github.com/apache/arrow/pull/41904#discussion_r1654768590
##########
docker-compose.yml:
##########
@@ -1221,6 +1222,50 @@ services:
volumes: *conda-volumes
command: *python-conda-command
+ conda-python-no-numpy:
+ # Usage:
+ # docker-compose build conda
+ # docker-compose build conda-cpp
+ # docker-compose build conda-python
+ # docker-compose build conda-python-no-numpy
+ # docker-compose run --rm conda-python-no-numpy
+ image: ${REPO}:${ARCH}-conda-python-${PYTHON}-no-numpy
+ build:
+ context: .
+ dockerfile: ci/docker/conda-python-pandas.dockerfile
+ cache_from:
+ - ${REPO}:${ARCH}-conda-python-${PYTHON}-pandas-${PANDAS}
+ args:
+ repo: ${REPO}
+ arch: ${ARCH}
+ python: ${PYTHON}
+ numpy: ${NUMPY}
+ pandas: ${PANDAS}
+ shm_size: *shm-size
+ environment:
+ <<: [*common, *ccache, *sccache]
+ PARQUET_REQUIRE_ENCRYPTION: # inherit
+ # The without numpy mark is only used to select tests to be run
+ # when numpy is not installed but those tests will also run
+ # if numpy is present.
Review Comment:
> Why wouldn't we exercise reading CSV
I know it was only an example, but FWIW it would be very easy to enable the
CSV tests in this PR. The `test_csv.py` file does import numpy, but it only
uses it once to generate random integers. That's something we should be able to
replace with stdlib `random` quite easily I think.
##########
docker-compose.yml:
##########
@@ -1221,6 +1222,50 @@ services:
volumes: *conda-volumes
command: *python-conda-command
+ conda-python-no-numpy:
+ # Usage:
+ # docker-compose build conda
+ # docker-compose build conda-cpp
+ # docker-compose build conda-python
+ # docker-compose build conda-python-no-numpy
+ # docker-compose run --rm conda-python-no-numpy
+ image: ${REPO}:${ARCH}-conda-python-${PYTHON}-no-numpy
+ build:
+ context: .
+ dockerfile: ci/docker/conda-python-pandas.dockerfile
+ cache_from:
+ - ${REPO}:${ARCH}-conda-python-${PYTHON}-pandas-${PANDAS}
+ args:
+ repo: ${REPO}
+ arch: ${ARCH}
+ python: ${PYTHON}
+ numpy: ${NUMPY}
+ pandas: ${PANDAS}
+ shm_size: *shm-size
+ environment:
+ <<: [*common, *ccache, *sccache]
+ PARQUET_REQUIRE_ENCRYPTION: # inherit
+ # The without numpy mark is only used to select tests to be run
+ # when numpy is not installed but those tests will also run
+ # if numpy is present.
Review Comment:
> Why wouldn't we exercise reading CSV
I know it was only an example, but FWIW it would be very easy to enable the
CSV tests in this PR. The `test_csv.py` file does import numpy, but it only
uses it once to generate random integers. That's something we should be able to
replace with stdlib `random` quite easily I think.
(and exactly the same for `test_json.py`)
##########
docker-compose.yml:
##########
@@ -1221,6 +1222,50 @@ services:
volumes: *conda-volumes
command: *python-conda-command
+ conda-python-no-numpy:
+ # Usage:
+ # docker-compose build conda
+ # docker-compose build conda-cpp
+ # docker-compose build conda-python
+ # docker-compose build conda-python-no-numpy
+ # docker-compose run --rm conda-python-no-numpy
+ image: ${REPO}:${ARCH}-conda-python-${PYTHON}-no-numpy
+ build:
+ context: .
+ dockerfile: ci/docker/conda-python-pandas.dockerfile
+ cache_from:
+ - ${REPO}:${ARCH}-conda-python-${PYTHON}-pandas-${PANDAS}
+ args:
+ repo: ${REPO}
+ arch: ${ARCH}
+ python: ${PYTHON}
+ numpy: ${NUMPY}
+ pandas: ${PANDAS}
+ shm_size: *shm-size
+ environment:
+ <<: [*common, *ccache, *sccache]
+ PARQUET_REQUIRE_ENCRYPTION: # inherit
+ # The without numpy mark is only used to select tests to be run
+ # when numpy is not installed but those tests will also run
+ # if numpy is present.
Review Comment:
> I understand the concerns but I think the alternative solution that's been
chosen (run a small subset of the tests) is much worse.
See also Raul's comment at
https://github.com/apache/arrow/pull/41904#issuecomment-2188800901. I think the
goal should be to eventually be able to run the full test suite without numpy
installed and have it automatically run the tests that it can run.
However, fixing all our tests to be able to do that in one PR might be a bit
too much. So my understanding is that the current solution (run a specific
subset of test files) is only meant as a first step.
##########
docker-compose.yml:
##########
@@ -1221,6 +1222,50 @@ services:
volumes: *conda-volumes
command: *python-conda-command
+ conda-python-no-numpy:
+ # Usage:
+ # docker-compose build conda
+ # docker-compose build conda-cpp
+ # docker-compose build conda-python
+ # docker-compose build conda-python-no-numpy
+ # docker-compose run --rm conda-python-no-numpy
+ image: ${REPO}:${ARCH}-conda-python-${PYTHON}-no-numpy
+ build:
+ context: .
+ dockerfile: ci/docker/conda-python-pandas.dockerfile
+ cache_from:
+ - ${REPO}:${ARCH}-conda-python-${PYTHON}-pandas-${PANDAS}
+ args:
+ repo: ${REPO}
+ arch: ${ARCH}
+ python: ${PYTHON}
+ numpy: ${NUMPY}
+ pandas: ${PANDAS}
+ shm_size: *shm-size
+ environment:
+ <<: [*common, *ccache, *sccache]
+ PARQUET_REQUIRE_ENCRYPTION: # inherit
+ # The without numpy mark is only used to select tests to be run
+ # when numpy is not installed but those tests will also run
+ # if numpy is present.
Review Comment:
> I understand the concerns but I think the alternative solution that's been
chosen (run a small subset of the tests) is much worse.
See also Raul's comment at
https://github.com/apache/arrow/pull/41904#issuecomment-2188800901. I think the
goal should be to eventually be able to run the full test suite without numpy
installed and have it automatically run the tests that it can run.
However, fixing all our tests to be able to do that in one PR might be a bit
too much. So my understanding is that the current solution (run a specific
subset of test files) is only meant as a first step.
(and I am not sure what would be a better alternative, although we could of
course do precursor PRs to prepare the test suite to reduce the dependency on
numpy, so this PR gets smaller. But personally I am fine with the current order
as well)
##########
python/pyarrow/tests/test_without_numpy.py:
##########
@@ -0,0 +1,58 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pytest
+
+import pyarrow as pa
+
+# Marks all of the tests in this module
+# Ignore these with pytest ... -m 'not nonumpy'
+pytestmark = pytest.mark.nonumpy
Review Comment:
Maybe `does_not_require_numpy` instead of `without_numpy` ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]