bascheibler opened a new issue, #38810:
URL: https://github.com/apache/arrow/issues/38810

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I'm trying to build a slim version of PyArrow, so that it fits in an AWS 
Lambda function. The base Docker image is `public.ecr.aws/lambda/python:3.12`, 
which is an Amazon Linux 2023 OS (based on Fedora).
   
   Building from the Dockerfile below, it fails when trying to create a wheel 
file. The error message I've got is:
   ```
   /var/task/arrow/python/setup.py:34: DeprecationWarning: pkg_resources is 
deprecated as an API. See 
https://setuptools.pypa.io/en/latest/pkg_resources.html
     import pkg_resources
   /var/lang/lib/python3.12/site-packages/setuptools/__init__.py:80: 
_DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
   !!
   
           
********************************************************************************
           Requirements should be satisfied by a PEP 517 installer.
           If you are using pip, you can try `pip install --use-pep517`.
           
********************************************************************************
   
   !!
     dist.fetch_build_eggs(dist.setup_requires)
   /var/lang/lib/python3.12/site-packages/setuptools_scm/git.py:135: 
UserWarning: "/var/task/arrow" is shallow and may cause errors
     warnings.warn(f'"{wd.path}" is shallow and may cause errors')
   running build_ext
   creating /var/task/arrow/python/build
   creating /var/task/arrow/python/build/temp.linux-x86_64-cpython-312
   -- Running cmake for PyArrow
   cmake 
-DCMAKE_INSTALL_PREFIX=/var/task/arrow/python/build/lib.linux-x86_64-cpython-312/pyarrow
 -DPYTHON_EXECUTABLE=/var/lang/bin/python3 
-DPython3_EXECUTABLE=/var/lang/bin/python3 -DPYARROW_CXXFLAGS= 
-DPYARROW_BUILD_CUDA=off -DPYARROW_BUILD_SUBSTRAIT=off 
-DPYARROW_BUILD_FLIGHT=off -DPYARROW_BUILD_GANDIVA=off -DPYARROW_BUILD_ACERO=on 
-DPYARROW_BUILD_DATASET=on -DPYARROW_BUILD_ORC=off -DPYARROW_BUILD_PARQUET=on 
-DPYARROW_BUILD_PARQUET_ENCRYPTION=off -DPYARROW_BUILD_GCS=off 
-DPYARROW_BUILD_S3=off -DPYARROW_BUILD_HDFS=off -DPYARROW_BUNDLE_ARROW_CPP=on 
-DPYARROW_BUNDLE_CYTHON_CPP=off -DPYARROW_GENERATE_COVERAGE=off 
-DCMAKE_BUILD_TYPE=release /var/task/arrow/python
   -- The C compiler identification is GNU 11.4.1
   -- The CXX compiler identification is GNU 11.4.1
   -- Detecting C compiler ABI info
   -- Detecting C compiler ABI info - done
   -- Check for working C compiler: /usr/bin/cc - skipped
   -- Detecting C compile features
   -- Detecting C compile features - done
   -- Detecting CXX compiler ABI info
   -- Detecting CXX compiler ABI info - done
   -- Check for working CXX compiler: /usr/bin/c++ - skipped
   -- Detecting CXX compile features
   -- Detecting CXX compile features - done
   -- System processor: x86_64
   -- Performing Test CXX_SUPPORTS_SSE4_2
   -- Performing Test CXX_SUPPORTS_SSE4_2 - Success
   -- Performing Test CXX_SUPPORTS_AVX2
   -- Performing Test CXX_SUPPORTS_AVX2 - Success
   -- Performing Test CXX_SUPPORTS_AVX512
   -- Performing Test CXX_SUPPORTS_AVX512 - Success
   -- Arrow build warning level: PRODUCTION
   -- Using ld linker
   -- Build Type: RELEASE
   -- CMAKE_C_FLAGS:  -Wall -fno-semantic-interposition -msse4.2  
-fdiagnostics-color=always  -fno-omit-frame-pointer -Wno-unused-variable 
-Wno-maybe-uninitialized
   -- CMAKE_CXX_FLAGS:  -Wno-noexcept-type  -Wall -fno-semantic-interposition 
-msse4.2  -fdiagnostics-color=always  -fno-omit-frame-pointer 
-Wno-unused-variable -Wno-maybe-uninitialized
   -- Generator: Unix Makefiles
   -- Build output directory: 
/var/task/arrow/python/build/temp.linux-x86_64-cpython-312/release
   -- Found Python3: /var/lang/bin/python3 (found version "3.12.0") found 
components: Interpreter Development.Module NumPy 
   -- Found Python3Alt: /var/lang/bin/python3  
   CMake Error at CMakeLists.txt:268 (find_package):
     By not providing "FindArrow.cmake" in CMAKE_MODULE_PATH this project has
     asked CMake to find a package configuration file provided by "Arrow", but
     CMake did not find one.
   
     Could not find a package configuration file provided by "Arrow" with any of
     the following names:
   
       ArrowConfig.cmake
       arrow-config.cmake
   
     Add the installation prefix of "Arrow" to CMAKE_PREFIX_PATH or set
     "Arrow_DIR" to a directory containing one of the above files.  If "Arrow"
     provides a separate development package or SDK, be sure it has been
     installed.
   
   
   -- Configuring incomplete, errors occurred!
   See also 
"/var/task/arrow/python/build/temp.linux-x86_64-cpython-312/CMakeFiles/CMakeOutput.log".
   error: command '/usr/bin/cmake' failed with exit code 1
   The command '/bin/sh -c pip3 install -r 
arrow/python/requirements-wheel-build.txt &&     pushd arrow/python &&     
python3 setup.py build_ext --build-type=release --bundle-arrow-cpp         
bdist_wheel --dist-dir /app/output &&     popd' returned a non-zero code: 1
   ```
   
   
   Dockerfile:
   ```
   FROM public.ecr.aws/lambda/python:3.12 AS build
   
   RUN dnf upgrade && \
       dnf install -y \
         gcc-c++ \
         git ca-certificates \
         python-setuptools \
         cmake \
         pkg-config \
         python3-devel \
         python3-pip
   
   RUN git clone --depth 1 -b apache-arrow-14.0.1 
https://github.com/apache/arrow.git
   
   # This is the folder where we will install the Arrow libraries during 
development
   RUN mkdir dist
   ENV ARROW_HOME=$(pwd)/dist
   ENV LD_LIBRARY_PATH=$(pwd)/dist/lib:$LD_LIBRARY_PATH
   ENV CMAKE_PREFIX_PATH=$ARROW_HOME:$CMAKE_PREFIX_PATH
   
   RUN mkdir arrow/cpp/build && \
       pushd arrow/cpp/build && \
       cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
           -DCMAKE_INSTALL_LIBDIR=lib \
           -DCMAKE_BUILD_TYPE=Release \
           -DARROW_BUILD_TESTS=OFF \
           -DARROW_COMPUTE=OFF \
           -DARROW_CSV=OFF \
           -DARROW_DATASET=ON \
           -DARROW_FILESYSTEM=ON \
           -DARROW_HDFS=OFF \
           -DARROW_JSON=OFF \
           -DARROW_PARQUET=ON \
           -DARROW_WITH_BROTLI=OFF \
           -DARROW_WITH_BZ2=OFF \
           -DARROW_WITH_LZ4=OFF \
           -DARROW_WITH_SNAPPY=ON \
           -DARROW_WITH_ZLIB=OFF \   
           -DARROW_WITH_ZSTD=OFF \
           -DPARQUET_REQUIRE_ENCRYPTION=OFF \
           .. && \
       make -j4 && \
       make install && \
       popd
   
   ENV PYARROW_WITH_PARQUET=1
   ENV PYARROW_WITH_DATASET=1
   ENV PYARROW_PARALLEL=4
   ENV PYARROW_INSTALL_TESTS=0
   
   # This is where it fails:
   RUN pip3 install -r arrow/python/requirements-wheel-build.txt && \
       pushd arrow/python && \
       python3 setup.py build_ext --build-type=release --bundle-arrow-cpp \
           bdist_wheel --dist-dir /app/output && \
       popd
   
   FROM public.ecr.aws/lambda/python:3.12
   
   COPY --from=build /app/output /app/output
   COPY . ${LAMBDA_TASK_ROOT}
   
   RUN dnf install -y gcc-c++ && \
       pip install pyarrow --no-index --find-links file:////app/output && \
       pip install --upgrade pip && \
       pip install --no-cache-dir -r requirements.txt
   
   CMD ["main.handler"]
   ```
   
   Is there another way to deploy a Lambda function containing 
snowflake-connector-python==3.5.0, pandas and pyarrow without exceeding the 
size limit?
   
   PS: I've tried building from PR #34234 as suggested on issue #34240 , but 
got the same result.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to