kmiku7 created ARROW-2002:
-----------------------------
Summary: use pyarrow download file will raise queue.FULL
exceptions sometimes
Key: ARROW-2002
URL: https://issues.apache.org/jira/browse/ARROW-2002
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.8.0
Environment: operating system: all
platform: all
Reporter: kmiku7
When we download file from hdfs, if the speed writer thread write data is
slower than read speed, download() will raise queue.Fulll exceptions, because
write_queue is full.
I think when we download file, we can wait until write_queue has space to
enqueue new item if writer_thread is alive. Like what upload() does.
{code}
>>> import pyarrow as pa
>>> cli = pa.hdfs.connect(user='USERNAME')
>>> cli.download('/REMOTE/HDFS/PATH', '/LOCAL/FILE/PATH')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/io-hdfs.pxi", line 428, in
pyarrow.lib.HadoopFileSystem.download
(/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:66399)
File "pyarrow/io-hdfs.pxi", line 429, in
pyarrow.lib.HadoopFileSystem.download
(/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:66351)
File "pyarrow/io.pxi", line 315, in pyarrow.lib.NativeFile.download
(/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:52249)
File "/usr/lib/python3.4/queue.py", line 187, in put_nowait
return self.put(item, block=False)
File "/usr/lib/python3.4/queue.py", line 133, in put
raise Full
queue.Full
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)