Ivan Dimitrov created ARROW-5318:
------------------------------------
Summary: pyarrow hdfs reader overrequests
Key: ARROW-5318
URL: https://issues.apache.org/jira/browse/ARROW-5318
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.10.0
Reporter: Ivan Dimitrov
When reading using HdfsFilesystem's read method, the amount is not constant and
flactuates between extra 0% to 300%.
Example code
{code:java}
fs = hdfs.connect(hostname, driver='libhdfs')
f = fs.open(dataset_path)
f.read(nbytes=3500000){code}
In this case, the read can send back up to 15 M bytes. The issue is true with
'libhdfs3' as well. Also present in newer versions of pyarrow.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)