raulcd commented on issue #36411: URL: https://github.com/apache/arrow/issues/36411#issuecomment-3908055313
Hi, I spent several days with the `meson-python` PR that @WillAyd has been working on for some time. I tried fixing wheels and sdist jobs as they were, and still are, not fully having successful builds. We had to do some changes on how the Arrow libraries were bundled spending some time with the `auditwheel/delocate/delvewheel` trio. There are still some things on those jobs to make CI fully green and compatible but spending some more days would probably bring them to a successful state. The PR is this one: - https://github.com/apache/arrow/pull/45854 I decided to start a POC on using `scikit-build-core` and reusing the existing CMake infrastructure and I was pleasantly surprised by how easy it was to get to an almost fully green CI for all our Python testing and wheels jobs. Currently the only related failure is sdist not containing the license files. I am currently working on fixing it. The POC PR can be seen here: - https://github.com/apache/arrow/pull/49259 I think we are at a point where we should be taking a decision on where do we want to go from here. I personally find meson nicer to work with than CMake but it is fair to say that Arrow C++ will possibly stay with CMake for the moment and the current flag integration is much simpler than what we have with the current `meson-python` solution. I am unsure what will be the experience for python-devs of PyArrow or users trying to build PyArrow with the current integration with Arrow C++. Right now we have something like this: ```sh ... # for all flags: PYARROW_WITH_SUBSTRAIT=$(case "$ARROW_SUBSTRAIT" in ON) echo "enabled" ;; OFF) echo "disabled" ;; *) echo "auto" ;; esac) ${PYTHON:-python} -m pip install --no-deps --no-build-isolation -vv . \ -Csetup-args="-Dbuildtype=${BUILD_TYPE}" \ -Csetup-args="-Dacero=${PYARROW_WITH_ACERO}" \ -Csetup-args="-Dazure=${PYARROW_WITH_AZURE}" \ -Csetup-args="-Dcuda=${PYARROW_WITH_CUDA}" \ -Csetup-args="-Ddataset=${PYARROW_WITH_DATASET}" \ -Csetup-args="-Dflight=${PYARROW_WITH_FLIGHT}" \ -Csetup-args="-Dgandiva=${PYARROW_WITH_GANDIVA}" \ -Csetup-args="-Dgcs=${PYARROW_WITH_GCS}" \ -Csetup-args="-Dhdfs=${PYARROW_WITH_HDFS}" \ -Csetup-args="-Dorc=${PYARROW_WITH_ORC}" \ -Csetup-args="-Dparquet=${PYARROW_WITH_PARQUET}" \ -Csetup-args="-Dparquet_require_encryption=${PYARROW_WITH_PARQUET_ENCRYPTION}" \ -Csetup-args="-Ds3=${PYARROW_WITH_S3}" \ -Csetup-args="-Dsubstrait=${PYARROW_WITH_SUBSTRAIT}" \ -Ccompile-args="-v" \ -Csetup-args="--pkg-config-path=${ARROW_HOME}/lib/pkgconfig" ``` In scikit-build-core we are using the same exact build scripts to build wheels we currently have (no change). Given that Arrow C++ will probably stay on CMake, I think reusing the existing CMake infrastructure via `scikit-build-core` is the more pragmatic path forward. This will allow us to get rid of the existing custom `setup.py`. What do others think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
