Rob Davies escribió:
Thanks for the feedback - please add a jira - but we don't generally
do releases from branches.
Your analysis looks correct to me - can you go through the issues you
had with 5.1 ? - might be better to get you on to the 5.1/5.2 release
asap
Yes, we are planning to switch in the near future. However, we didn't have time to test a newer release. The last time we got into 5.1 lots of problems (among them, huge memory leaks) affected us.

Once we have time to go again into the 5.x road, I will try to keep you informed about the bugs we encounter.


Any further comment about the need to detach the doConsume part of the TcpTransport run() into a different thread? I'm going to give it a test, just creating an per-instance single thread executor (to avoid any potential out of order annoyances), locally into the run() method (to avoid overhead into TcpTransport instances not being used as listeners).


Best regards.

Manuel.

cheers,

Rob

On 16 Sep 2008, at 16:13, Manuel Teira Paz wrote:

Hello, I would like to share some thoughts and adventures about tcp
and nio transports to your consideration, hopefully waiting for some
feedback.

We are using a 4.1 activemq compiled from the 4.1 svn branch. For
some time we didn't run into any important problem, but lately, we
were suffering some issue regarding tcp transport.

The problem arises when the tcp buffer gets full during a
TcpBufferedOutputStream.flush(). When this happens, and probably
when all the consumers/producers are sharing the same connection, we
run into a deadlock situation, since the socket OutputStream writes
in locking mode. Meanwhile, no reader that could extract some data
from the socket to ease the situation is allowed to do its work,
since it shares the same connection locked in the write attempt. Do
you agree with this analysis and the chance that it could happen?

As a solution, nio and its non-blocking socket management, selectors
and friends, seemed the way to go. Unfortunately, the nio transport
is not available in the 4.1 branch, but it was easily backported
from the trunk. Trying to use it, some issues arised:

- Connection attempts were temporized, and the whole system worked
randomly and unresponsible. There were no deadlocks, but one symptom
was that transport.nio.SelectorSelection spent a lot of time waiting
for the socketChannel.register call to complete, in the
SelectorSelection constructor.
I don't know the exact reason, but it seems that
SelectorWorker.run() monopolizes the access to the selector doing:

while (isRunning()) {
int count = selector.select(10);
if (count == 0) {
  continue;
}

I didn't have the chance to check if this thread has greater
priority than the one running the SelectorSelection constructor.
Anyway, as a workaround I changed the previous code with:

int count = selector.select(10);
if (count == 0) {
+   Thread.yield();
continue;
}

and mostly everything started to work as expected. I was able to
connect consistently to the broker, using a nio:// transport.

- The remaining problem I found is that a java test client (connect,
sends a message, and closes the connection) didn't close itself
correctly, and it did so using the tcp:// transport. I found two
possible sources for this problem:

 a). NIOTransport doesn't close the selection on doStop. I think
this is needed to allow the SelectorWorker thread to finalize.
 b). Even after doing that, and since the
SelectorManager.selectorExecutor is the result of calling
Executors.newCachedThreadPool, the idle threads are not destroyed
inmediatly, but after 60 seconds. Since these threads are created as
non-daemon threads, the VM waits for them to finish. As a
workaround, I changed the instantiation of
SelectorManager.selectorExecutor to:

  private Executor selectorExecutor =
Executors.newCachedThreadPool(new ThreadFactory() {
      public Thread newThread(Runnable r) {
          Thread rc = new Thread(r);
          rc.setName("NIO Transport Thread");
+            rc.setDaemon(true);
          return rc;
      }
  });

Hence, avoiding them to be created as non-daemon threads. However, I
suppose this could be dangerous, and something could remain
inconsistent. Another solution could be not to use a
cachedThreadPool, but this could hit the performance. What would be
the best way to avoid the client shutdown delay?

Currently, changing to 5.1 or 5.2 is not an option for us, since we
run into problems in our previous attempts to switch. We need to
remain (at least while we don't have time enough to run a complete
validation of 5.1 or the upcoming 5.2) with 4.1 and the needed
patches to make it work properly.

Also, if you want 4.1 to feature NIO support,  I could open a JIRA
issue attaching the patch. Anyway, any idea, comment or proposal
about the problems we run into and the exposed solutions will be
very welcome.

Best regards.


Manuel.



Reply via email to