[
https://issues.apache.org/jira/browse/ARROW-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111075#comment-16111075
]
Wes McKinney commented on ARROW-1316:
-------------------------------------
I am not sure this is possible. To use libhdfs to access an HDFS cluster, you
need:
* A JVM installation
* The Hadoop client libraries in your classpath
* File system-like API for the libhdfs library
These are provided respectively by the JDK install, the Hadoop install, and the
Arrow libraries. The Arrow interface to HDFS provides a consistent API as other
files (https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/hdfs.h).
This is the same approach used in TensorFlow
(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/hadoop/hadoop_file_system.h)
and other projects.
> hdfs connector stand-alone
> --------------------------
>
> Key: ARROW-1316
> URL: https://issues.apache.org/jira/browse/ARROW-1316
> Project: Apache Arrow
> Issue Type: Wish
> Reporter: Martin Durant
>
> Currently, access to hdfs via libhdfs requires the whole of arrow, a java
> installation and a hadoop installation. This setup is indeed common, such as
> on "cluster edge-nodes".
> This issue is posted with the wish that hdfs file-system access could be done
> without needing the whole set of installations, above.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)