Felix created ARROW-15421:
-----------------------------

             Summary: Need a pip install option for out-of-the-box HDFS support
                 Key: ARROW-15421
                 URL: https://issues.apache.org/jira/browse/ARROW-15421
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
            Reporter: Felix


Hi folks! And thank you for your great work.

I want to use PyArrow to develop a simple client application that needs to 
connect to HDFS clusters and exchange data with it

But if I want to use HDFS in PyArrow, I have to manually download full Hadoop 
distro, find there {{libhdfs.so}} – and manually provide hadoop's CLASSPATH as 
an environment variable.

I need something like *{{{_}{{pip3 install pyarrow}}{_}_{{[hdfs]}}_}}* that 
will give my pyarrow with pre-built libhdfs and minimal set of Hadoop JARs 
needed for its run – where pyarrow.hdfs.* classes could be called without 
additional boilerplate code.

Can you please add it in future releases of PyArrow?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to