[
https://issues.apache.org/jira/browse/ARROW-15421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Felix updated ARROW-15421:
--------------------------
Description:
Hi folks! And thank you for your great work.
I want to use PyArrow to develop a simple client application that needs to
connect to HDFS clusters and exchange data with it
But if I want to use HDFS in PyArrow, I have to manually download full Hadoop
distro, find there {{libhdfs.so}} – and manually provide hadoop's CLASSPATH as
an environment variable.
I need something like *{{{}_pip3 install pyarrow_{}}}{_}{{[hdfs]}}{_}* that
will give my pyarrow with pre-built libhdfs and minimal set of Hadoop JARs
needed for its run – where pyarrow.hdfs.* classes could be called without
additional boilerplate code.
Can you please add it in future releases of PyArrow?
was:
Hi folks! And thank you for your great work.
I want to use PyArrow to develop a simple client application that needs to
connect to HDFS clusters and exchange data with it
But if I want to use HDFS in PyArrow, I have to manually download full Hadoop
distro, find there {{libhdfs.so}} – and manually provide hadoop's CLASSPATH as
an environment variable.
I need something like *{{{_}{{pip3 install pyarrow}}{_}_{{[hdfs]}}_}}* that
will give my pyarrow with pre-built libhdfs and minimal set of Hadoop JARs
needed for its run – where pyarrow.hdfs.* classes could be called without
additional boilerplate code.
Can you please add it in future releases of PyArrow?
> Need a pip install option for out-of-the-box HDFS support
> ---------------------------------------------------------
>
> Key: ARROW-15421
> URL: https://issues.apache.org/jira/browse/ARROW-15421
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Reporter: Felix
> Priority: Major
>
> Hi folks! And thank you for your great work.
> I want to use PyArrow to develop a simple client application that needs to
> connect to HDFS clusters and exchange data with it
> But if I want to use HDFS in PyArrow, I have to manually download full Hadoop
> distro, find there {{libhdfs.so}} – and manually provide hadoop's CLASSPATH
> as an environment variable.
> I need something like *{{{}_pip3 install pyarrow_{}}}{_}{{[hdfs]}}{_}* that
> will give my pyarrow with pre-built libhdfs and minimal set of Hadoop JARs
> needed for its run – where pyarrow.hdfs.* classes could be called without
> additional boilerplate code.
> Can you please add it in future releases of PyArrow?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)