Paulo Roberto Cerioni created ARROW-6469:
--------------------------------------------
Summary: PyArrow HDFS documentation does not mention HDFS short
circuit readings
Key: ARROW-6469
URL: https://issues.apache.org/jira/browse/ARROW-6469
Project: Apache Arrow
Issue Type: Bug
Components: Python
Reporter: Paulo Roberto Cerioni
Due to PyArrow using libhdfs underneath, it is expected that files reading from
HDFS are going to make use of short circuit readings.
However, the PyArrow documentation does not explain whether this feature is
supported (and on what situations) and if that works without any configuration.
For instance, I'm interested in the use case in which we make use of short
circuit feature to read some of the columns from a Parquet file located in HDFS
into a dataframe.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)