Oh - important detail, the directions that I was following are in https://arrow.apache.org/docs/developers/python.html .
Steve On 2023/11/27 18:38:55 Akshara Sadheesh wrote: > Thank you so much for your reply Raul! So I did run the build using the > build_venv.sh file. The issue was I think I did not copy over the libarrow.so > files from my docker container in the `root/dist/lib` directory. I have added > them onto `arrow/python/pyarrow`. > > After the build finished I copied over the libarrow.so files from > `root/dist/lib` in my container to my host machine and added the libarrow.so > files to the `arrow/python/pyarrow` folder. This got rid of the missing > libarrow.so files error. > > I then added this new pyarrow folder to my lambda layers folder, the > deploy.sh script will take care of building out the new environment using > codebuild. I am using a managed Ubuntu Standard 6.0 image > (https://github.com/aws/aws-codebuild-docker-images/blob/master/ubuntu/standard/6.0/Dockerfile). > This uses glibc version 2.35. As much as possible I would like to avoid > changing the glibc version for this as it is a managed image. > > Issue: > > The issue is when I add the custom pyarrow to my lambda layers and run the > step function I get this error: > > `GLIBC_2.32* not found (required by > /opt/python/pyarrow/lib.cpython-310-x86_64-linux-gnu.so > <http://lib.cpython-310-x86_64-linux-gnu.so/>)` > > I keep bumping into a glibc version error. This error is present even after > modifying the Dockerfile to use the same base image the code build managed > image uses with GLIBC 2.35. > > This is the modified `arrow/python/examples/minimal_build/Dockerfile.ubuntu` > used: > > ` > > FROM public.ecr.aws/ubuntu/ubuntu:22.04 > > ENV DEBIAN_FRONTEND=noninteractive > > RUN apt-get update -y -q && \ > apt-get install -y -q --no-install-recommends \ > apt-transport-https \ > software-properties-common \ > wget && \ > apt-get install -y -q --no-install-recommends \ > build-essential \ > cmake \ > git \ > ninja-build \ > python3.10 \ > python3.10-dev \ > python3.10-venv \ > && \ > apt-get clean && rm -rf /var/lib/apt/lists* > > # Set Python 3.10 as the default Python version > RUN update-alternatives --install /usr/bin/python3 python3 > /usr/bin/python3.10 1 > > RUN wget https://bootstrap.pypa.io/get-pip.py && \ > python3 get-pip.py && \ > rm get-pip.py > > ` > > This is the `arrow/python/examples/minimal_build/build_venv.sh` used: > > > ` > > #!/usr/bin/env bash > # Licensed to the Apache Software Foundation (ASF) under one > # or more contributor license agreements. See the NOTICE file > # distributed with this work for additional information > # regarding copyright ownership. The ASF licenses this file > # to you under the Apache License, Version 2.0 (the > # "License"); you may not use this file except in compliance > # with the License. You may obtain a copy of the License at > # > # http://www.apache.org/licenses/LICENSE-2.0 > # > # Unless required by applicable law or agreed to in writing, > # software distributed under the License is distributed on an > # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY > # KIND, either express or implied. See the License for the > # specific language governing permissions and limitations > # under the License. > > set -e > > #---------------------------------------------------------------------- > # Change this to whatever makes sense for your system > > WORKDIR=${WORKDIR:-$HOME} > MINICONDA=$WORKDIR/miniconda-for-arrow > LIBRARY_INSTALL_DIR=$WORKDIR/local-libs > CPP_BUILD_DIR=$WORKDIR/arrow-cpp-build > ARROW_ROOT=/arrow > export ARROW_HOME=$WORKDIR/dist > export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH > > python3 -m venv $WORKDIR/venv > source $WORKDIR/venv/bin/activate > > git config --global --add safe.directory $ARROW_ROOT > > pip install -r $ARROW_ROOT/python/requirements-build.txt > > #---------------------------------------------------------------------- > # Build C++ library > > mkdir -p $CPP_BUILD_DIR > pushd $CPP_BUILD_DIR > > cmake -GNinja \ > -DCMAKE_BUILD_TYPE=Release \ > -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ > -DCMAKE_INSTALL_LIBDIR=lib \ > -DCMAKE_UNITY_BUILD=ON \ > -DARROW_BUILD_STATIC=OFF \ > -DARROW_COMPUTE=ON \ > -DARROW_CSV=ON \ > -DARROW_FILESYSTEM=ON \ > -DARROW_JSON=ON \ > $ARROW_ROOT/cpp > > ninja install > > popd > > #---------------------------------------------------------------------- > # Build and test Python library > pushd $ARROW_ROOT/python > > rm -rf build/ # remove any pesky pre-existing build directory > > export > CMAKE_PREFIX_PATH=${ARROW_HOME}${CMAKE_PREFIX_PATH:+:${CMAKE_PREFIX_PATH}} > export PYARROW_BUILD_TYPE=Release > export PYARROW_CMAKE_GENERATOR=Ninja > > # You can run either "develop" or "build_ext --inplace". Your pick > > python setup.py build_ext --inplace > # python setup.py develop > > # pip install -r $ARROW_ROOT/python/requirements-test.txt > > # py.test pyarrow > > ` > > > > I would be very thankful for any help and advice that you can offer. > > Thank you very much, > > Shara > > > On 2023/11/22 14:29:49 Raúl Cumplido wrote: > > Hi Shara, > > > > The example dockerfile installs the base requirements for Ubuntu but > > then we use the build_venv.sh (or build_conda.sh) to build the Arrow > > CPP library and then pyarrow [1]. > > > > From the error it seems you did not build Arrow CPP as libarrow.so > > can't be found. Can you try following the recipe on the provided sh > > file? > > > > Kind regards, > > Raúl > > > > [1] > > https://github.com/apache/arrow/blob/main/python/examples/minimal_build/build_venv.sh > > > > El mar, 21 nov 2023 a las 23:05, Akshara Sadheesh > > (<sh...@gmail.com>) escribió: > > > > > > Hi, > > > > > > I have been trying to use the minimal_build for python with the > > > provided examples Dockerfile.ubuntu for my lambda layers since it has > > > a 250 MB limit. I am able to run the build and generate a pyarrow > > > library. However, the library does not contain any shared .so files. > > > When in use, it says: > > > > > > `"Unable to import module 'lambda_function': libarrow.so.1500: cannot > > > open shared object file: No such file or directory"` > > > > > > I modified the Dockerfile to use python 3.10, ubuntu image to 22.04 > > > and set the `--platform linux/x86_64` when building the image to > > > ensure it is compatible with the lambda architecture. > > > > > > I would be very grateful if you could help me with this, > > > > > > Thank you! > > > > > > Shara > > Sent from my iPhone