Otávio Vasques created ARROW-10737:
--------------------------------------

             Summary: [Python] Pyarrow 2.0.0 seems to not have the filesystem 
module.
                 Key: ARROW-10737
                 URL: https://issues.apache.org/jira/browse/ARROW-10737
             Project: Apache Arrow
          Issue Type: Bug
         Environment: requirements:

numpy==1.19.0
pandas==1.1.4
scikit-learn==0.23.2
matplotlib==3.3.3
seaborn==0.11.0
fastapi==0.61.2
uvicorn==0.12.2
shap==0.37.0
pyarrow==2.0.0
datalab==0.7.0
PyHive==0.6.3
fsspec
jupyter
requests

Dockerfile:
FROM python:3.8.6-slim-buster

RUN apt-get update -y && \
    apt-get install -y libgomp1 build-essential wget apt-transport-https gnupg

# Java
RUN mkdir -p /usr/share/man/man1 && \
    wget -qO - https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | 
apt-key add - && \
    echo "deb https://adoptopenjdk.jfrog.io/adoptopenjdk/deb buster main" | tee 
/etc/apt/sources.list.d/adoptopenjdk.list && \
    apt-get update && apt-get install -y adoptopenjdk-8-hotspot
ENV JAVA_HOME /usr/lib/jvm/adoptopenjdk-8-hotspot-amd64

# Hadoop Installation
ENV HADOOP_USER_NAME hdfs
ENV HADOOP_PREFIX /usr/local/hadoop
ENV HADOOP_COMMON_HOME /usr/local/hadoop
ENV HADOOP_HDFS_HOME /usr/local/hadoop
ENV CONF_PREFIX /opt/hadoop
ENV HADOOP_CONF_DIR /opt/hadoop/hadoop-conf
ENV YARN_CONF_DIR /opt/hadoop/yarn-conf
ENV ARROW_LIBHDFS_DIR /usr/local/hadoop/lib/native/
ENV PATH="/usr/local/hadoop/bin:${PATH}"

RUN mkdir -p ${CONF_PREFIX}

COPY hadoop/conf/hadoop-conf /opt/hadoop/hadoop-conf 
COPY hadoop/conf/yarn-conf /opt/hadoop/yarn-conf 

RUN wget -qO - 
https://downloads.apache.org/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz | 
tar -xzf - -C /usr/local \
&& ln -s /usr/local/hadoop-2.10.1 /usr/local/hadoop

COPY requirements.txt .

RUN pip install -U setuptools pip 
    pip install -r requirements.txt

RUN mkdir /repo
ENV HOME /repo
WORKDIR /repo
            Reporter: Otávio Vasques


{code:java}
Python 3.8.6 (default, Nov 18 2020, 14:00:57)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> pa.__version__
'2.0.0'
>>> pa.fs
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/local/lib/python3.8/site-packages/pyarrow/__init__.py", line 252, 
in __getattr__
 raise AttributeError(
AttributeError: module 'pyarrow' has no attribute 'fs'
>>>{code}

I was using the previous pa.hdfs method that is now deprecated. I tried to 
update and use the new HadoopFileSystem class from the fs module but I got this 
error. What could be causing this?

This is running inside a docker container. I will put requirements and the 
dockerfile in the environment section.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to