tcp and nio transport considerations

Manuel Teira Paz Tue, 16 Sep 2008 08:17:50 -0700

Hello, I would like to share some thoughts and adventures about tcp andnio transports to your consideration, hopefully waiting for some feedback.

We are using a 4.1 activemq compiled from the 4.1 svn branch. For sometime we didn't run into any important problem, but lately, we weresuffering some issue regarding tcp transport.

The problem arises when the tcp buffer gets full during aTcpBufferedOutputStream.flush(). When this happens, and probably whenall the consumers/producers are sharing the same connection, we run intoa deadlock situation, since the socket OutputStream writes in lockingmode. Meanwhile, no reader that could extract some data from the socketto ease the situation is allowed to do its work, since it shares thesame connection locked in the write attempt. Do you agree with thisanalysis and the chance that it could happen?

As a solution, nio and its non-blocking socket management, selectors andfriends, seemed the way to go. Unfortunately, the nio transport is notavailable in the 4.1 branch, but it was easily backported from thetrunk. Trying to use it, some issues arised:

- Connection attempts were temporized, and the whole system workedrandomly and unresponsible. There were no deadlocks, but one symptom wasthat transport.nio.SelectorSelection spent a lot of time waiting for thesocketChannel.register call to complete, in the SelectorSelectionconstructor.I don't know the exact reason, but it seems that SelectorWorker.run()monopolizes the access to the selector doing:


while (isRunning()) {
 int count = selector.select(10);
 if (count == 0) {
   continue;
 }

I didn't have the chance to check if this thread has greater prioritythan the one running the SelectorSelection constructor. Anyway, as aworkaround I changed the previous code with:


int count = selector.select(10);
if (count == 0) {
+   Thread.yield();
 continue;
}

and mostly everything started to work as expected. I was able toconnect consistently to the broker, using a nio:// transport.

- The remaining problem I found is that a java test client (connect,sends a message, and closes the connection) didn't close itselfcorrectly, and it did so using the tcp:// transport. I found twopossible sources for this problem:

a). NIOTransport doesn't close the selection on doStop. I think thisis needed to allow the SelectorWorker thread to finalize.b). Even after doing that, and since theSelectorManager.selectorExecutor is the result of callingExecutors.newCachedThreadPool, the idle threads are not destroyedinmediatly, but after 60 seconds. Since these threads are created asnon-daemon threads, the VM waits for them to finish. As a workaround, Ichanged the instantiation of SelectorManager.selectorExecutor to:

private Executor selectorExecutor =Executors.newCachedThreadPool(new ThreadFactory() {

       public Thread newThread(Runnable r) {
           Thread rc = new Thread(r);
           rc.setName("NIO Transport Thread");
+            rc.setDaemon(true);
           return rc;
       }
   });

Hence, avoiding them to be created as non-daemon threads. However, Isuppose this could be dangerous, and something could remaininconsistent. Another solution could be not to use a cachedThreadPool,but this could hit the performance. What would be the best way to avoidthe client shutdown delay?

Currently, changing to 5.1 or 5.2 is not an option for us, since we runinto problems in our previous attempts to switch. We need to remain (atleast while we don't have time enough to run a complete validation of5.1 or the upcoming 5.2) with 4.1 and the needed patches to make it workproperly.

Also, if you want 4.1 to feature NIO support, I could open a JIRA issueattaching the patch. Anyway, any idea, comment or proposal about theproblems we run into and the exposed solutions will be very welcome.


Best regards.


Manuel.

tcp and nio transport considerations

Reply via email to