Mark, On 9/30/13 9:34 AM, Mark Thomas wrote: > On 30/09/2013 08:32, Mark Thomas wrote: >> On 28/09/2013 11:57, Mladen Turk wrote: >>> On 09/28/2013 12:25 PM, Rainer Jung wrote: >>>> >>>>> I can't seem to reproduce the problem using an APR connector on Linux: >>>>> reports seems to indicate that trivially accessing Tomcat via an HTTP >>>>> APR connector will cause a crash, and I was able to run the following >>>>> command without bringing down my instance: >>>>> >>>>> $ ab -n 10000 -c 50 http://localhost:28215/my-webapp/index.html >>>> >>>> >>>> Still the whole poll stuff is platform dependent, so it might be you >>>> will not be able to reproduce the Windows crash even when the test >>>> scenario is exactly the same as there. >>>> >>> >>> Almost every APR crash error is related to reusing closed >>> object/descriptor/pointer. >>> Eg, crashing inside poll can be caused by closing the socket that is >>> inside poller. >>> The fact that it doesn't crash on linux might be just because of "close" >>> order. >>> Even on the same OS it doesn't have to crash all the time. >>> >>> Anyhow, I'll try to check the 8-RC3 with the latest native on windows. >>> Seems there are some conceptual problem used inside TC8 probably with >>> async sockets and double close. >> >> I suspect multiple factors: >> - the refactoring to a single poller thread may have introduced some >> additional issues we haven't tracked down yet >> - allowing two threads (one read, one write) to work with the same >> socket introduces lots of opportunities to close a socket in one thread >> while it is still in use in another >> >> I've been mulling over refactoring the connector code so all actions on >> the socket go via the socket wrapper so it is easier to track true >> socket state in one place. Now might be the time to implement that change. > > I believe I have found the root cause of the crash. It is the code I > added to remove a socket from the poller when it was being closed. > > Since removing a socket from the poller is not thread safe, I modified > the code so that the poller thread did this. The problem this introduced > is that the socket may well be closed before it is removed from the > poller and there is a strong possibility that poll() will be called on > the closed socket. Either of these happening will trigger a crash.
I may need a little more guidance, since adding NULL-checks for everything is a little silly. Here is the function declaration in tcnative: TCN_IMPLEMENT_CALL(jint, Poll, poll)(TCN_STDARGS, jlong pollset, jlong timeout, jlongArray set, jboolean remove) Presumably, the pollset itself is non-null, but could all of the items in the pollset be null? The pollset itself is directly dereferenced, of course, but then it looks like elements "*ep" are dereferenced in the setup loop of the function. Then, there is a bunch of ring-shifting which does not appear to be threadsafe in and of itself, but may be handled by another mechanism. I'll work with Konstantin Preißer and and Ognjen to get tcnative patched up a bit to at least prevent crashes. Thanks, -chris
signature.asc
Description: OpenPGP digital signature