casperhart commented on issue #45369:
URL: https://github.com/apache/arrow/issues/45369#issuecomment-2696010277

   The docker setup I used is fairly complex so I tried to simplify it down to 
this:
   
   ```
   ARG TARGET_PLATFORM=linux/arm64
   
   FROM --platform=${TARGET_PLATFORM} ubuntu:22.04
   ENV ARCH=arm64
   
   ENV DEBIAN_FRONTEND=noninteractive
   
   # Set environment variables
   ENV HADOOP_VERSION=3.4.1
   ENV HADOOP_HOME=/opt/hadoop
   ENV HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
   ENV PATH=${PATH}:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
   
   # Install dependencies
   RUN apt-get update && apt-get install -y \
       openjdk-11-jdk \
       wget \
       ssh \
       pdsh \
       python3 \
       python3-pip \
       python3-dev \
       build-essential
   
   # Install PyArrow and other Python packages
   RUN pip install --no-cache-dir pyarrow
   
   # Download and set up Hadoop
   RUN wget 
https://downloads.apache.org/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz
 \
       && tar -xzf hadoop-${HADOOP_VERSION}.tar.gz \
       && mv hadoop-${HADOOP_VERSION} ${HADOOP_HOME} \
       && rm hadoop-${HADOOP_VERSION}.tar.gz
   
   # Set up JAVA_HOME in Hadoop config
   ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-$ARCH
   
   # make sure libhdfs.so is in the right place
   RUN if [ ! -f /opt/hadoop/lib/native/libhdfs.so ]; then \
         echo "ERROR: required_file.txt not found!" && \
         exit 1; \
       fi
   
   RUN python3 -c "\
   import os;\
   path='/opt/hadoop/lib/native/libhdfs.so';\
   print('path ', path, 'exists: ', os.path.exists(path));\
   import pyarrow.fs as fs;\
   hdfs = fs.HadoopFileSystem('0.0.0.0')"
   ```
   
   I know the actual issue is not with arrow, but with installing the 
non-aarch64 hadoop version, it's just the error message from arrow is 
misleading. I'll try building hadoop from source on the mac but I don't have 
high hopes.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to