[
https://issues.apache.org/jira/browse/THRIFT-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259763#comment-14259763
]
Jens Geyer commented on THRIFT-2918:
------------------------------------
Re the timeout: I came up with this because the log output lead me to the
impression that the workers were terminated too early. But that's kind of a
guess, I'm not really sure about that.
> Race condition in Python TProcessPoolServer test
> ------------------------------------------------
>
> Key: THRIFT-2918
> URL: https://issues.apache.org/jira/browse/THRIFT-2918
> Project: Thrift
> Issue Type: Bug
> Components: Python - Library
> Environment: openSUSE 13.2
> Python 2.7.8 (default, Sep 30 2014, 15:34:38) [GCC] on linux2
> Reporter: Jens Geyer
>
> {{make check}} gets stuck very reproducible in my VM at Test run #218:
> {code}
> Test run #217: (includes gen-py-default) Server=TProcessPoolServer,
> Proto=accel, zlib=False, SSL=False
> Testing server TProcessPoolServer: /usr/bin/python ./TestServer.py
> --genpydir=gen-py-default --protocol=accel --port=9090 TProcessPoolServer
> Testing client: /usr/bin/python ./TestClient.py --genpydir=gen-py-default
> --protocol=accel --port=9090 --transport=buffered
> ...testException(Safe)
> testException(Xception)
> testException(throw_undeclared)
> ...............
> ----------------------------------------------------------------------
> Ran 18 tests in 0.563s
> OK
> Giving TProcessPoolServer (proto=accel,zlib=False,ssl=False) an extra 3
> seconds for childprocesses to terminate via alarm
> Terminating worker: <Process(Process-1, started daemon)>
> Terminating worker: <Process(Process-2, started daemon)>
> Terminating worker: <Process(Process-3, started daemon)>
> Terminating worker: <Process(Process-4, started daemon)>
> Terminating worker: <Process(Process-5, started daemon)>
> Requesting server to stop()
> OK: Finished (includes gen-py-default) TProcessPoolServer / accel proto /
> zlib=False / SSL=False. 217 combinations tested.
> Test run #218: (includes gen-py-default) Server=TProcessPoolServer,
> Proto=accel, zlib=False, SSL=True
> Testing server TProcessPoolServer: /usr/bin/python ./TestServer.py
> --genpydir=gen-py-default --protocol=accel --port=9090 --ssl
> TProcessPoolServer
> Testing client: /usr/bin/python ./TestClient.py --genpydir=gen-py-default
> --protocol=accel --port=9090 --ssl --transport=buffered
> ...testException(Safe)
> testException(Xception)
> testException(throw_undeclared)
> ..........Terminating worker: <Process(Process-1, started daemon)>
> Terminating worker: <Process(Process-2, started daemon)>
> Terminating worker: <Process(Process-3, started daemon)>
> Terminating worker: <Process(Process-4, started daemon)>
> Terminating worker: <Process(Process-5, started daemon)>
> Requesting server to stop()
> {code}
> After fiddling a bit around with it I got it to work by increasing the
> alarm() timeout from 2 seconds to 4 seconds.
> I'm not a Python expert, but the code looks somewhat interesting to me:
> - The server code starts the workers, but some piece of code outside of the
> server is responsible for terminating them. Is that really idiomatic in
> Python or just bad design?
> - The Condition() object used in TProcessPoolsServer.py should probably be
> replaced by an Event() object. Especially, as Condition.wait() [seems to have
> it's own
> perils|http://stackoverflow.com/questions/24137480/threading-condition-waittimeout-ignores-threading-condition-notify]
> and Event is much easier to use.
> - Calling Condition.aquire() without a matching release() within a {{while
> True:}} loop looks also not very convincing to me. AFAIK the second call to
> aquire() will block, if that ever happens (it did not in my tests).
> The bad news is, that neither of the proposed changes above had any effect on
> the race conditions, except increasing the timeout - but that is merely a
> workaround, not a solution.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)