> On Dec 13, 2016, at 4:59 AM, Oleg Kalnichevski <[email protected]> wrote: > > On Mon, 2016-12-12 at 21:15 +0000, Idzerda, Edan wrote: >> Hello! Our reverse proxy uses the Async Client pool to handle connections >> to backend servers. We've been tracking a problem for a while where we >> observe the initial TCP connection is made, but no thread is available to >> handle the SSL setup before a 10 second timeout expires. We get into >> trouble because some of our backend servers are very slow, and some of our >> clients download very slowly. >> >> >> I'm experimenting with a patch to AbstractMultiworkerIOReactor.addChannel() >> to determine whether the next dispatcher thread is "busy." My first try was >> to look at bufferedSessions from the BaseIOReactor, and go through the list >> of dispatchers one time to see if I can find a free one. >> >> >> int i = Math.abs(this.currentWorker++ % this.workerCount); >> >> for (int j = 0; j < this.workerCount; j++) { >> if (this.dispatchers[i].getSessionCount() == 0) { >> break; >> } >> i = Math.abs(this.currentWorker++ % this.workerCount); >> } >> this.dispatchers[i].addChannel(entry); >> >> This seems to help us in MOST of the cases we see this issue in production, >> but there still seem to be a small number of threads which collide. I'm >> testing a different version which looks at AbstractIOReactor "sessions" to >> determine thread busy state, but it never seems to show more than "1" >> session if I look at the size after piling up slow connections on top of >> each other. >> >> I have two questions: >> Is there a better way to determine whether a thread is busy? >> Would you be willing to accept a patch to make the dispatchers array in >> AbstractMultiworkerIOReactor "protected" so I can implement my own >> ConnectingIOReactor that overrides addChannel() with my own thread selection >> model? >> >> Thanks a lot for your help and for providing such a great library to the >> community! >> >> - edan >> > > Hi Edan > > What I do not quite understand is why i/o dispatch threads get blocked > for 10 seconds or longer. This sounds awfully suspicious. > > I could imagine exposing the list of i/o dispatchers to subclasses of > AbstractMultiworkerIOReactor in 4.4.x branch but would rather prefer to > keep it as a last resort. > > Oleg
Thanks.. I would prefer not to have to patch httpcore-nio like this if I could work out the root cause. Since I am still seeing connections failing to complete SSL within 10 seconds with my first patch (above), I am trying a new one now that uses an AtomicInteger for currentWorker. We are seeing far less connection problems with the patch, but there are still enough apparent thread selection collisions that some requests fail. The only way I have been able to reproduce this problem is by using an artificially rate limited connection (ex, curl --limit-rate 1m) and downloading a relatively large file. If I use a small file, say 50K, I notice that the dispatchers thread do not get stuck. I can download more files than I have worker threads, and AbstractIOReactor’s “sessions” set count stays at 0. With a larger file, like 500k, the sessions size goes to 1, and I can only download the same number of files as I have worker threads. Does this make any sense to you? Is it possible the higher level proxy library is hanging on to the HttpResponse’s Entity too long? I see they call HttpEntity.getContent() and create an InputStream out of it… But why would that make a worker thread become non-responsive until it finishes? I see a note on IOEventDispatch suggesting that “all methods of this interface are executed on the dispatch thread of the I/O reactor … it is important that processing that takes place in the event methods will not block the dispatch thread for too long, as the I/O reactor will be unable to react to other events” Is that worth pursuing? Any suggestions on how to debug this would be appreciated! - edan
