On Wed, 2016-12-14 at 03:38 +0000, Idzerda, Edan wrote: > > > On Dec 13, 2016, at 4:59 AM, Oleg Kalnichevski <[email protected]> wrote: > > > > On Mon, 2016-12-12 at 21:15 +0000, Idzerda, Edan wrote: > >> Hello! Our reverse proxy uses the Async Client pool to handle connections > >> to backend servers. We've been tracking a problem for a while where we > >> observe the initial TCP connection is made, but no thread is available to > >> handle the SSL setup before a 10 second timeout expires. We get into > >> trouble because some of our backend servers are very slow, and some of our > >> clients download very slowly. > >> > >> > >> I'm experimenting with a patch to > >> AbstractMultiworkerIOReactor.addChannel() to determine whether the next > >> dispatcher thread is "busy." My first try was to look at bufferedSessions > >> from the BaseIOReactor, and go through the list of dispatchers one time to > >> see if I can find a free one. > >> > >> > >> int i = Math.abs(this.currentWorker++ % this.workerCount); > >> > >> for (int j = 0; j < this.workerCount; j++) { > >> if (this.dispatchers[i].getSessionCount() == 0) { > >> break; > >> } > >> i = Math.abs(this.currentWorker++ % this.workerCount); > >> } > >> this.dispatchers[i].addChannel(entry); > >> > >> This seems to help us in MOST of the cases we see this issue in > >> production, but there still seem to be a small number of threads which > >> collide. I'm testing a different version which looks at AbstractIOReactor > >> "sessions" to determine thread busy state, but it never seems to show more > >> than "1" session if I look at the size after piling up slow connections on > >> top of each other. > >> > >> I have two questions: > >> Is there a better way to determine whether a thread is busy? > >> Would you be willing to accept a patch to make the dispatchers array in > >> AbstractMultiworkerIOReactor "protected" so I can implement my own > >> ConnectingIOReactor that overrides addChannel() with my own thread > >> selection model? > >> > >> Thanks a lot for your help and for providing such a great library to the > >> community! > >> > >> - edan > >> > > > > Hi Edan > > > > What I do not quite understand is why i/o dispatch threads get blocked > > for 10 seconds or longer. This sounds awfully suspicious. > > > > I could imagine exposing the list of i/o dispatchers to subclasses of > > AbstractMultiworkerIOReactor in 4.4.x branch but would rather prefer to > > keep it as a last resort. > > > > Oleg > > Thanks.. I would prefer not to have to patch httpcore-nio like this if I > could work out the root cause. Since I am still seeing connections failing > to complete SSL within 10 seconds with my first patch (above), I am trying a > new one now that uses an AtomicInteger for currentWorker. We are seeing far > less connection problems with the patch, but there are still enough apparent > thread selection collisions that some requests fail. > > The only way I have been able to reproduce this problem is by using an > artificially rate limited connection (ex, curl --limit-rate 1m) and > downloading a relatively large file. If I use a small file, say 50K, I > notice that the dispatchers thread do not get stuck. I can download more > files than I have worker threads, and AbstractIOReactor’s “sessions” set > count stays at 0. With a larger file, like 500k, the sessions size goes to > 1, and I can only download the same number of files as I have worker threads. > > Does this make any sense to you? Is it possible the higher level proxy > library is hanging on to the HttpResponse’s Entity too long? I see they call > HttpEntity.getContent() and create an InputStream out of it…
This is likely to be the cause of your grief. InputStream / OutputStream interfaces are inherently blocking and they do not mix well with event driven i/o without quite bit of effort and complex code. By using blocking i/o to produce requests or consume response the higher level proxy library likely blocks i/o dispatch threads and starves other connections managed by the same dispatcher. I would recommend rewriting your code based on native HttpAsyncRequestProducer / HttpAsyncResponseConsumer for more optimal results. Oleg > But why would that make a worker thread become non-responsive until it > finishes? I see a note on IOEventDispatch suggesting that “all methods of > this interface are executed on the dispatch thread of the I/O reactor … it is > important that processing that takes place in the event methods will not > block the dispatch thread for too long, as the I/O reactor will be unable to > react to other events” > > Is that worth pursuing? Any suggestions on how to debug this would be > appreciated! > > - edan > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
