[ https://issues.apache.org/jira/browse/ARROW-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364539#comment-16364539 ]
Wes McKinney commented on ARROW-2081: ------------------------------------- Is there a way we can detect the fork in the child process(es) and at least avoid a hang or segfault? > Hdfs client isn't fork-safe > --------------------------- > > Key: ARROW-2081 > URL: https://issues.apache.org/jira/browse/ARROW-2081 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python > Reporter: Jim Crist > Priority: Major > > Given the following script: > > {code:java} > import multiprocessing as mp > import pyarrow as pa > def ls(h): > print("calling ls") > return h.ls("/tmp") > if __name__ == '__main__': > h = pa.hdfs.connect() > print("Using 'spawn'") > pool = mp.get_context('spawn').Pool(2) > results = pool.map(ls, [h, h]) > sol = h.ls("/tmp") > for r in results: > assert r == sol > print("'spawn' succeeded\n") > print("Using 'fork'") > pool = mp.get_context('fork').Pool(2) > results = pool.map(ls, [h, h]) > sol = h.ls("/tmp") > for r in results: > assert r == sol > print("'fork' succeeded") > {code} > > Results in the following output: > > {code:java} > $ python test.py > Using 'spawn' > calling ls > calling ls > 'spawn' succeeded > Using 'fork{code} > > The process then hangs, and I have to `kill -9` the forked worker processes. > > I'm unable to get the libhdfs3 driver to work, so I'm unsure if this is a > problem with libhdfs or just arrow's use of it (a quick google search didn't > turn up anything useful). -- This message was sent by Atlassian JIRA (v7.6.3#76005)