[ 
https://issues.apache.org/jira/browse/ARROW-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111075#comment-16111075
 ] 

Wes McKinney commented on ARROW-1316:
-------------------------------------

I am not sure this is possible. To use libhdfs to access an HDFS cluster, you 
need:

* A JVM installation
* The Hadoop client libraries in your classpath
* File system-like API for the libhdfs library

These are provided respectively by the JDK install, the Hadoop install, and the 
Arrow libraries. The Arrow interface to HDFS provides a consistent API as other 
files (https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/hdfs.h). 
This is the same approach used in TensorFlow 
(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/hadoop/hadoop_file_system.h)
 and other projects. 

> hdfs connector stand-alone
> --------------------------
>
>                 Key: ARROW-1316
>                 URL: https://issues.apache.org/jira/browse/ARROW-1316
>             Project: Apache Arrow
>          Issue Type: Wish
>            Reporter: Martin Durant
>
> Currently, access to hdfs via libhdfs requires the whole of arrow, a java 
> installation and a hadoop installation. This setup is indeed common, such as 
> on "cluster edge-nodes".
> This issue is posted with the wish that hdfs file-system access could be done 
> without needing the whole set of installations, above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to