[ 
https://issues.apache.org/jira/browse/KUDU-2686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-2686:
------------------------------
    Priority: Major  (was: Blocker)

Lowering the priority of this based on some findings:

This seems like a test-only issue, caused by the interaction between KuduClient 
and the Python `multiprocessing` library. Said library appears to use `fork` to 
copy memory contents, including the KuduClient and all of its internal members, 
including lock states. This causes a hang when copying a held lock. Evidence[1] 
for this is that we can see multiple threads waiting on the same futex, and no 
threads appear to have the lock, and this can be more easily reproduced by 
mangling[2] some locking behavior (only tested on Ubuntu 14.04).

[1][https://gist.github.com/andrwng/d2d21c551362ddd564926c2a4ec406ae]

[2][https://gist.github.com/andrwng/cc6c211c62b1235cc58944d513ba6655]

 

> test_scantoken.py hangs in the python2 build
> --------------------------------------------
>
>                 Key: KUDU-2686
>                 URL: https://issues.apache.org/jira/browse/KUDU-2686
>             Project: Kudu
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 1.9.0
>            Reporter: Andrew Wong
>            Assignee: Andrew Wong
>            Priority: Major
>         Attachments: consoleText.txt, consoleTextFailed.txt
>
>
> *16:54:20* ============================= test session starts 
> ==============================*16:54:20* platform linux2 -- Python 2.7.6, 
> pytest-3.2.5, py-1.7.0, pluggy-0.4.0*16:54:20* rootdir: 
> /home/jenkins-slave/workspace/kudu-master/1/python, inifile: 
> pytest.ini*16:54:20* plugins: timeout-1.2.0*16:54:20* timeout: 100.0s method: 
> signal*16:54:20* collected 92 items*16:54:20* *16:54:20* 
> kudu/tests/test_client.py .............................*16:54:22* 
> kudu/tests/test_scanner.py ........................*16:54:23* 
> kudu/tests/test_scantoken.py .Build timed out (after 50 minutes). Marking the 
> build as failed.*17:44:23* Build was aborted
>  
> It failed in this 
> [pre-commit|http://jenkins.kudu.apache.org/job/kudu-gerrit/16176/BUILD_TYPE=DEBUG/],
>  but unfortunately there don't appear to be logs. I've attached the Jenkins 
> output for the build-and-test.sh run that led to the hang, and one that 
> didn't hang. Looking through the python-related build logs and comparing it 
> to one that didn't fail (also attached), I didn't notice any differences in 
> python package versioning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to