[ 
https://issues.apache.org/jira/browse/ARROW-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17659054#comment-17659054
 ] 

Rok Mihevc commented on ARROW-2025:
-----------------------------------

This issue has been migrated to [issue 
#18006|https://github.com/apache/arrow/issues/18006] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Python/C++] HDFS Client disconnect closes all open clients
> -----------------------------------------------------------
>
>                 Key: ARROW-2025
>                 URL: https://issues.apache.org/jira/browse/ARROW-2025
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>            Reporter: Jim Crist
>            Assignee: Jim Crist
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> In the python library, if an instance of `HadoopFileSystem` is garbage 
> collected, all other existing instances become invalid. I haven't checked 
> with a C++ only example, but from reading the cython code I can't see how 
> cython is responsible, so I think this is a bug in the C++ library.
>  
> {code:java}
> >>> import pyarrow as pa
> >>> h = pa.hdfs.connect()
> 18/01/24 16:54:25 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/01/24 16:54:26 WARN shortcircuit.DomainSocketFactory: The short-circuit 
> local reads feature cannot be used because libhadoop cannot be loaded.
> >>> h.ls("/")
> ['/benchmarks', '/hbase', '/tmp', '/user', '/var']
> >>> h2 = pa.hdfs.connect()
> >>> del h  # close one client
> >>> h2.ls("/")  # all filesystem operations now fail
> hdfsListDirectory(/): FileSystem#listStatus error:
> IOException: Filesystem closedjava.io.IOException: Filesystem closed
>         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:865)
>         at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2106)
>         at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2092)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:743)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:113)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:808)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:804)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:804)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/opt/conda/lib/python3.6/site-packages/pyarrow/hdfs.py", line 88, in 
> ls
>     return super(HadoopFileSystem, self).ls(path, detail)
>   File "io-hdfs.pxi", line 248, in pyarrow.lib.HadoopFileSystem.ls
>   File "error.pxi", line 79, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: HDFS: list directory failed
> >>> h2.is_open  # The python object still thinks it's open
> True
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to