bascheibler opened a new issue, #38810: URL: https://github.com/apache/arrow/issues/38810
### Describe the bug, including details regarding any error messages, version, and platform. I'm trying to build a slim version of PyArrow, so that it fits in an AWS Lambda function. The base Docker image is `public.ecr.aws/lambda/python:3.12`, which is an Amazon Linux 2023 OS (based on Fedora). Building from the Dockerfile below, it fails when trying to create a wheel file. The error message I've got is: ``` /var/task/arrow/python/setup.py:34: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources /var/lang/lib/python3.12/site-packages/setuptools/__init__.py:80: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated. !! ******************************************************************************** Requirements should be satisfied by a PEP 517 installer. If you are using pip, you can try `pip install --use-pep517`. ******************************************************************************** !! dist.fetch_build_eggs(dist.setup_requires) /var/lang/lib/python3.12/site-packages/setuptools_scm/git.py:135: UserWarning: "/var/task/arrow" is shallow and may cause errors warnings.warn(f'"{wd.path}" is shallow and may cause errors') running build_ext creating /var/task/arrow/python/build creating /var/task/arrow/python/build/temp.linux-x86_64-cpython-312 -- Running cmake for PyArrow cmake -DCMAKE_INSTALL_PREFIX=/var/task/arrow/python/build/lib.linux-x86_64-cpython-312/pyarrow -DPYTHON_EXECUTABLE=/var/lang/bin/python3 -DPython3_EXECUTABLE=/var/lang/bin/python3 -DPYARROW_CXXFLAGS= -DPYARROW_BUILD_CUDA=off -DPYARROW_BUILD_SUBSTRAIT=off -DPYARROW_BUILD_FLIGHT=off -DPYARROW_BUILD_GANDIVA=off -DPYARROW_BUILD_ACERO=on -DPYARROW_BUILD_DATASET=on -DPYARROW_BUILD_ORC=off -DPYARROW_BUILD_PARQUET=on -DPYARROW_BUILD_PARQUET_ENCRYPTION=off -DPYARROW_BUILD_GCS=off -DPYARROW_BUILD_S3=off -DPYARROW_BUILD_HDFS=off -DPYARROW_BUNDLE_ARROW_CPP=on -DPYARROW_BUNDLE_CYTHON_CPP=off -DPYARROW_GENERATE_COVERAGE=off -DCMAKE_BUILD_TYPE=release /var/task/arrow/python -- The C compiler identification is GNU 11.4.1 -- The CXX compiler identification is GNU 11.4.1 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- System processor: x86_64 -- Performing Test CXX_SUPPORTS_SSE4_2 -- Performing Test CXX_SUPPORTS_SSE4_2 - Success -- Performing Test CXX_SUPPORTS_AVX2 -- Performing Test CXX_SUPPORTS_AVX2 - Success -- Performing Test CXX_SUPPORTS_AVX512 -- Performing Test CXX_SUPPORTS_AVX512 - Success -- Arrow build warning level: PRODUCTION -- Using ld linker -- Build Type: RELEASE -- CMAKE_C_FLAGS: -Wall -fno-semantic-interposition -msse4.2 -fdiagnostics-color=always -fno-omit-frame-pointer -Wno-unused-variable -Wno-maybe-uninitialized -- CMAKE_CXX_FLAGS: -Wno-noexcept-type -Wall -fno-semantic-interposition -msse4.2 -fdiagnostics-color=always -fno-omit-frame-pointer -Wno-unused-variable -Wno-maybe-uninitialized -- Generator: Unix Makefiles -- Build output directory: /var/task/arrow/python/build/temp.linux-x86_64-cpython-312/release -- Found Python3: /var/lang/bin/python3 (found version "3.12.0") found components: Interpreter Development.Module NumPy -- Found Python3Alt: /var/lang/bin/python3 CMake Error at CMakeLists.txt:268 (find_package): By not providing "FindArrow.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "Arrow", but CMake did not find one. Could not find a package configuration file provided by "Arrow" with any of the following names: ArrowConfig.cmake arrow-config.cmake Add the installation prefix of "Arrow" to CMAKE_PREFIX_PATH or set "Arrow_DIR" to a directory containing one of the above files. If "Arrow" provides a separate development package or SDK, be sure it has been installed. -- Configuring incomplete, errors occurred! See also "/var/task/arrow/python/build/temp.linux-x86_64-cpython-312/CMakeFiles/CMakeOutput.log". error: command '/usr/bin/cmake' failed with exit code 1 The command '/bin/sh -c pip3 install -r arrow/python/requirements-wheel-build.txt && pushd arrow/python && python3 setup.py build_ext --build-type=release --bundle-arrow-cpp bdist_wheel --dist-dir /app/output && popd' returned a non-zero code: 1 ``` Dockerfile: ``` FROM public.ecr.aws/lambda/python:3.12 AS build RUN dnf upgrade && \ dnf install -y \ gcc-c++ \ git ca-certificates \ python-setuptools \ cmake \ pkg-config \ python3-devel \ python3-pip RUN git clone --depth 1 -b apache-arrow-14.0.1 https://github.com/apache/arrow.git # This is the folder where we will install the Arrow libraries during development RUN mkdir dist ENV ARROW_HOME=$(pwd)/dist ENV LD_LIBRARY_PATH=$(pwd)/dist/lib:$LD_LIBRARY_PATH ENV CMAKE_PREFIX_PATH=$ARROW_HOME:$CMAKE_PREFIX_PATH RUN mkdir arrow/cpp/build && \ pushd arrow/cpp/build && \ cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DCMAKE_INSTALL_LIBDIR=lib \ -DCMAKE_BUILD_TYPE=Release \ -DARROW_BUILD_TESTS=OFF \ -DARROW_COMPUTE=OFF \ -DARROW_CSV=OFF \ -DARROW_DATASET=ON \ -DARROW_FILESYSTEM=ON \ -DARROW_HDFS=OFF \ -DARROW_JSON=OFF \ -DARROW_PARQUET=ON \ -DARROW_WITH_BROTLI=OFF \ -DARROW_WITH_BZ2=OFF \ -DARROW_WITH_LZ4=OFF \ -DARROW_WITH_SNAPPY=ON \ -DARROW_WITH_ZLIB=OFF \ -DARROW_WITH_ZSTD=OFF \ -DPARQUET_REQUIRE_ENCRYPTION=OFF \ .. && \ make -j4 && \ make install && \ popd ENV PYARROW_WITH_PARQUET=1 ENV PYARROW_WITH_DATASET=1 ENV PYARROW_PARALLEL=4 ENV PYARROW_INSTALL_TESTS=0 # This is where it fails: RUN pip3 install -r arrow/python/requirements-wheel-build.txt && \ pushd arrow/python && \ python3 setup.py build_ext --build-type=release --bundle-arrow-cpp \ bdist_wheel --dist-dir /app/output && \ popd FROM public.ecr.aws/lambda/python:3.12 COPY --from=build /app/output /app/output COPY . ${LAMBDA_TASK_ROOT} RUN dnf install -y gcc-c++ && \ pip install pyarrow --no-index --find-links file:////app/output && \ pip install --upgrade pip && \ pip install --no-cache-dir -r requirements.txt CMD ["main.handler"] ``` Is there another way to deploy a Lambda function containing snowflake-connector-python==3.5.0, pandas and pyarrow without exceeding the size limit? PS: I've tried building from PR #34234 as suggested on issue #34240 , but got the same result. ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org