kmiku7 created ARROW-2002: ----------------------------- Summary: use pyarrow download file will raise queue.FULL exceptions sometimes Key: ARROW-2002 URL: https://issues.apache.org/jira/browse/ARROW-2002 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.8.0 Environment: operating system: all platform: all Reporter: kmiku7
When we download file from hdfs, if the speed writer thread write data is slower than read speed, download() will raise queue.Fulll exceptions, because write_queue is full. I think when we download file, we can wait until write_queue has space to enqueue new item if writer_thread is alive. Like what upload() does. {code} >>> import pyarrow as pa >>> cli = pa.hdfs.connect(user='USERNAME') >>> cli.download('/REMOTE/HDFS/PATH', '/LOCAL/FILE/PATH') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pyarrow/io-hdfs.pxi", line 428, in pyarrow.lib.HadoopFileSystem.download (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:66399) File "pyarrow/io-hdfs.pxi", line 429, in pyarrow.lib.HadoopFileSystem.download (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:66351) File "pyarrow/io.pxi", line 315, in pyarrow.lib.NativeFile.download (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:52249) File "/usr/lib/python3.4/queue.py", line 187, in put_nowait return self.put(item, block=False) File "/usr/lib/python3.4/queue.py", line 133, in put raise Full queue.Full {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)