pitrou commented on code in PR #49259:
URL: https://github.com/apache/arrow/pull/49259#discussion_r2846660185


##########
.github/workflows/dev.yml:
##########
@@ -103,7 +103,7 @@ jobs:
         shell: bash
         run: |
           gem install test-unit openssl
-          pip install "cython>=3.1" setuptools pytest requests setuptools-scm
+          pip install build "cython>=3.1" pytest requests scikit-build-core 
setuptools-scm

Review Comment:
   Can we rely on `pyproject.toml` for build dependencies at some point? 



##########
python/_build_backend/__init__.py:
##########
@@ -0,0 +1,79 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Build backend wrapper that resolves license symlinks before delegating
+to scikit-build-core.
+
+Arrow's LICENSE.txt and NOTICE.txt live at the repository root, one level
+above python/. They are symlinked into python/ so that license-files in
+pyproject.toml can reference them otherwise project metadata fails validation.
+This is done before any build backend is invoked that's why symlinks are 
necessary.
+But when building sdist tarballs symlinks are not copied and we end up with
+broken LICENSE.txt and NOTICE.txt.
+
+This custom build backend only replace the symlinks with hardlinks
+before scikit_build_core.build.build_sdist so
+that sdist contains the actual file content. The symlinks are restored
+afterwards so the git working tree stays clean.
+"""
+
+from contextlib import contextmanager
+import os
+from pathlib import Path
+import shutil
+import sys
+
+from scikit_build_core.build import *  # noqa: F401,F403
+from scikit_build_core.build import build_sdist as scikit_build_sdist
+
+LICENSE_FILES = ("LICENSE.txt", "NOTICE.txt")
+PYTHON_DIR = Path(__file__).resolve().parent.parent
+
+
+@contextmanager
+def prepare_licenses():
+    # Temporarily replace symlinks with hardlinks so sdist gets real content.
+    # On Windows we just copy the files since hardlinks might not be supported.
+    for name in LICENSE_FILES:
+        parent_license = PYTHON_DIR.parent / name
+        pyarrow_license = PYTHON_DIR / name
+        if sys.platform == "win32":
+            # For Windows copy the files.
+            pyarrow_license.unlink(missing_ok=True)
+            shutil.copy2(parent_license, pyarrow_license)
+        else:
+            # For Unix-like systems we replace the symlink with
+            # a hardlink to avoid copying the file content.

Review Comment:
   Is it important to do this? We might make the code simpler by copying on all 
platform.



##########
ci/scripts/python_build.bat:
##########
@@ -111,10 +111,7 @@ echo "=== CCACHE Stats after build ==="
 ccache -sv
 
 echo "=== Building Python ==="
-set PYARROW_BUILD_TYPE=%CMAKE_BUILD_TYPE%
-set PYARROW_BUILD_VERBOSE=1
 set PYARROW_BUNDLE_ARROW_CPP=ON
-set PYARROW_CMAKE_GENERATOR=%CMAKE_GENERATOR%

Review Comment:
   Does scikit-build-core pick up the `CMAKE_xxx` environment variables 
automatically?



##########
python/_build_backend/__init__.py:
##########
@@ -0,0 +1,79 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Build backend wrapper that resolves license symlinks before delegating
+to scikit-build-core.
+
+Arrow's LICENSE.txt and NOTICE.txt live at the repository root, one level
+above python/. They are symlinked into python/ so that license-files in
+pyproject.toml can reference them otherwise project metadata fails validation.
+This is done before any build backend is invoked that's why symlinks are 
necessary.
+But when building sdist tarballs symlinks are not copied and we end up with
+broken LICENSE.txt and NOTICE.txt.
+
+This custom build backend only replace the symlinks with hardlinks
+before scikit_build_core.build.build_sdist so
+that sdist contains the actual file content. The symlinks are restored
+afterwards so the git working tree stays clean.
+"""
+
+from contextlib import contextmanager
+import os
+from pathlib import Path
+import shutil
+import sys
+
+from scikit_build_core.build import *  # noqa: F401,F403
+from scikit_build_core.build import build_sdist as scikit_build_sdist
+
+LICENSE_FILES = ("LICENSE.txt", "NOTICE.txt")
+PYTHON_DIR = Path(__file__).resolve().parent.parent
+
+
+@contextmanager
+def prepare_licenses():
+    # Temporarily replace symlinks with hardlinks so sdist gets real content.
+    # On Windows we just copy the files since hardlinks might not be supported.
+    for name in LICENSE_FILES:
+        parent_license = PYTHON_DIR.parent / name
+        pyarrow_license = PYTHON_DIR / name
+        if sys.platform == "win32":
+            # For Windows copy the files.
+            pyarrow_license.unlink(missing_ok=True)
+            shutil.copy2(parent_license, pyarrow_license)
+        else:
+            # For Unix-like systems we replace the symlink with
+            # a hardlink to avoid copying the file content.
+            if pyarrow_license.is_symlink():
+                target = pyarrow_license.resolve()
+                pyarrow_license.unlink()
+                os.link(target, pyarrow_license)
+    try:
+        yield
+    finally:
+        if sys.platform != "win32":

Review Comment:
   Have you raised the issue on the scikit-build-core issue tracker?



##########
python/pyproject.toml:
##########
@@ -81,16 +82,18 @@ exclude = [
     '\._.*$',
 ]
 
-[tool.setuptools]
-zip-safe=false
-include-package-data=true
+[tool.scikit-build]
+cmake.build-type = "Release"
+metadata.version.provider = "scikit_build_core.metadata.setuptools_scm"
+sdist.include = ["pyarrow/_generated_version.py", "cmake_modules/"]
+wheel.packages = ["pyarrow"]
+wheel.install-dir = "pyarrow"
 
-[tool.setuptools.packages.find]
-include = ["pyarrow"]
-namespaces = false
-
-[tool.setuptools.package-data]
-pyarrow = ["*.pxd", "*.pyi", "*.pyx", "includes/*.pxd", "py.typed"]
+[tool.scikit-build.cmake.define]
+PYARROW_BUNDLE_ARROW_CPP = {env = "PYARROW_BUNDLE_ARROW_CPP", default = "OFF"}
+PYARROW_BUNDLE_CYTHON_CPP = {env = "PYARROW_BUNDLE_CYTHON_CPP", default = 
"OFF"}
+PYARROW_GENERATE_COVERAGE = {env = "PYARROW_GENERATE_COVERAGE", default = 
"OFF"}

Review Comment:
   At some point we should check whether this flag still works. Eons ago it was 
used to generate codecov reports, but we ditched all the corresponding 
scripting that had to merge coverage files generated during the C++ test suite 
and the Python test suite.
   
   Can you open an issue for that?
   



##########
python/CMakeLists.txt:
##########
@@ -352,7 +352,21 @@ set(PYARROW_CPP_ROOT_DIR pyarrow/src)
 set(PYARROW_CPP_SOURCE_DIR ${PYARROW_CPP_ROOT_DIR}/arrow/python)
 
 # Write out compile-time configuration constants
-string(TOUPPER ${CMAKE_BUILD_TYPE} UPPERCASE_PYBUILD_TYPE)
+if(CMAKE_BUILD_TYPE)
+  string(TOUPPER "${CMAKE_BUILD_TYPE}" UPPERCASE_PYBUILD_TYPE)
+else()
+  # For multi-config generators (XCode and Visual Studio),
+  # CMAKE_BUILD_TYPE is not set at configure time.
+  # scikit-build-core does the right thing with cmake.build-type and
+  # adds the corresponding --config but does not populate CMAKE_BUILD_TYPE
+  # for those. On this specific case, we set the default to "RELEASE"
+  # as it's the most common build type for users building from source.
+  # This is mainly relevant for our Windows wheels, which are built with
+  # Visual Studio and thus use a multi-config generator with Release.
+  # As a note this is only to populate config_internal.h.cmake.

Review Comment:
   Hmm, it's not possible to populate `config_internal.h.cmake` separately for 
each config?



##########
python/pyproject.toml:
##########
@@ -81,16 +82,18 @@ exclude = [
     '\._.*$',
 ]
 
-[tool.setuptools]
-zip-safe=false
-include-package-data=true
+[tool.scikit-build]
+cmake.build-type = "Release"
+metadata.version.provider = "scikit_build_core.metadata.setuptools_scm"
+sdist.include = ["pyarrow/_generated_version.py", "cmake_modules/"]
+wheel.packages = ["pyarrow"]
+wheel.install-dir = "pyarrow"
 
-[tool.setuptools.packages.find]
-include = ["pyarrow"]
-namespaces = false
-
-[tool.setuptools.package-data]
-pyarrow = ["*.pxd", "*.pyi", "*.pyx", "includes/*.pxd", "py.typed"]
+[tool.scikit-build.cmake.define]
+PYARROW_BUNDLE_ARROW_CPP = {env = "PYARROW_BUNDLE_ARROW_CPP", default = "OFF"}
+PYARROW_BUNDLE_CYTHON_CPP = {env = "PYARROW_BUNDLE_CYTHON_CPP", default = 
"OFF"}

Review Comment:
   Did you check that those were actually bundled in the wheels?



##########
ci/scripts/python_build.bat:
##########
@@ -111,10 +111,7 @@ echo "=== CCACHE Stats after build ==="
 ccache -sv
 
 echo "=== Building Python ==="
-set PYARROW_BUILD_TYPE=%CMAKE_BUILD_TYPE%
-set PYARROW_BUILD_VERBOSE=1
 set PYARROW_BUNDLE_ARROW_CPP=ON
-set PYARROW_CMAKE_GENERATOR=%CMAKE_GENERATOR%

Review Comment:
   Based on other changes you did, it doesn't seem it does?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to