[
https://issues.apache.org/jira/browse/ARROW-6044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893872#comment-16893872
]
Wes McKinney commented on ARROW-6044:
-------------------------------------
We're passing through calls to libhdfs. It's possible that there is some
resource leak, but I'm not sure where it would be. Maybe you can ask the Apache
Hadoop community?
> Pyarrow HDFS client gets hung after a while
> -------------------------------------------
>
> Key: ARROW-6044
> URL: https://issues.apache.org/jira/browse/ARROW-6044
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.13.0
> Environment: hadoop-3.0.3
> driver='libhdfs'
> python 3.6
> Centos7
> Reporter: Fred Tzeng
> Priority: Major
>
> I'm using the pyarrow HDFS client in a long running (forever) app that makes
> connections to HDFS as external requests come in and destroys the connection
> as soon as the request is handled. This happens a large amount of times on
> separate threads and everything works great.
> The problem is, after the app idles for a while (perhaps hours) and no HDFS
> connections are made during this time, when the next connection is attempted,
> the API hdfs.connect(...) just hangs. No exceptions are thrown.
> Code snippet on what i'm doing to instantiate each connection:
> ...
> hdfs = pyarrow.hdfs.connect(self.hdfs_authority, self.hdfs_port,
> user=self.hdfs_user)
> try:
> //Do something
> finally:
> hdfs.close
>
> Any help on what might be causing these hangs is appreciated
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)