Jens Geyer created THRIFT-2918:
----------------------------------
Summary: Race condition in Python TProcessPoolServer test
Key: THRIFT-2918
URL: https://issues.apache.org/jira/browse/THRIFT-2918
Project: Thrift
Issue Type: Bug
Components: Python - Library
Environment: openSUSE 13.2
Python 2.7.8 (default, Sep 30 2014, 15:34:38) [GCC] on linux2
Reporter: Jens Geyer
{{make check}} gets stuck very reproducible in my VM at Test run #218:
{code}
Test run #217: (includes gen-py-default) Server=TProcessPoolServer,
Proto=accel, zlib=False, SSL=False
Testing server TProcessPoolServer: /usr/bin/python ./TestServer.py
--genpydir=gen-py-default --protocol=accel --port=9090 TProcessPoolServer
Testing client: /usr/bin/python ./TestClient.py --genpydir=gen-py-default
--protocol=accel --port=9090 --transport=buffered
...testException(Safe)
testException(Xception)
testException(throw_undeclared)
...............
----------------------------------------------------------------------
Ran 18 tests in 0.563s
OK
Giving TProcessPoolServer (proto=accel,zlib=False,ssl=False) an extra 3 seconds
for childprocesses to terminate via alarm
Terminating worker: <Process(Process-1, started daemon)>
Terminating worker: <Process(Process-2, started daemon)>
Terminating worker: <Process(Process-3, started daemon)>
Terminating worker: <Process(Process-4, started daemon)>
Terminating worker: <Process(Process-5, started daemon)>
Requesting server to stop()
OK: Finished (includes gen-py-default) TProcessPoolServer / accel proto /
zlib=False / SSL=False. 217 combinations tested.
Test run #218: (includes gen-py-default) Server=TProcessPoolServer,
Proto=accel, zlib=False, SSL=True
Testing server TProcessPoolServer: /usr/bin/python ./TestServer.py
--genpydir=gen-py-default --protocol=accel --port=9090 --ssl TProcessPoolServer
Testing client: /usr/bin/python ./TestClient.py --genpydir=gen-py-default
--protocol=accel --port=9090 --ssl --transport=buffered
...testException(Safe)
testException(Xception)
testException(throw_undeclared)
..........Terminating worker: <Process(Process-1, started daemon)>
Terminating worker: <Process(Process-2, started daemon)>
Terminating worker: <Process(Process-3, started daemon)>
Terminating worker: <Process(Process-4, started daemon)>
Terminating worker: <Process(Process-5, started daemon)>
Requesting server to stop()
{code}
After fiddling a bit around with it I got it to work by increasing the alarm()
timeout from 2 seconds to 4 seconds.
I'm not a Python expert, but the code looks somewhat interesting to me:
- The server code starts the workers, but some piece of code outside of the
server is responsible for terminating them. Is that really idiomatic in Python
or just bad design?
- The Condition() object used in TProcessPoolsServer.py should probably be
replaced by an Event() object. Especially, as Condition.wait() [seems to have
it's own
perils|http://stackoverflow.com/questions/24137480/threading-condition-waittimeout-ignores-threading-condition-notify]
and Event is much easier to use.
- Calling Condition.aquire() without a matching release() within a {{while
True:}} loop looks also not very convincing to me. AFAIK the second call to
aquire() will block, if that ever happens (it did not in my tests).
The bad news is, that neither of the proposed changes above had any effect on
the race conditions, except increasing the timeout - but that is merely a
workaround, not a solution.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)