[
https://issues.apache.org/activemq/browse/AMQ-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Filip Hanik reopened AMQ-1993:
------------------------------
Regression: [Regression]
Index:
activemq-core/src/main/java/org/apache/activemq/transport/tcp/TcpBufferedOutputStream.java
looks like I have a copy paste error
+ try {
+ writeTimestamp = System.currentTimeMillis();
+ out.write(b, off, len);
+ } finally {
+ writeTimestamp = System.currentTimeMillis();
+ }
should be
+ try {
+ writeTimestamp = System.currentTimeMillis();
+ out.write(b, off, len);
+ } finally {
+ writeTimestamp = -1;
+ }
> Systems hang due to inability to timeout socket write operation
> ---------------------------------------------------------------
>
> Key: AMQ-1993
> URL: https://issues.apache.org/activemq/browse/AMQ-1993
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.1.0, 5.2.0
> Environment: Unix (Solaris and Linux tested)
> Reporter: Filip Hanik
> Assignee: Gary Tully
> Priority: Critical
> Fix For: 5.3.0
>
> Attachments: patch-1-threadname-filter.patch,
> patch-3-tcp-writetimeout.patch
>
>
> the blocking Java Socket API doesn't have a timeout on socketWrite
> invocations.
> This means, if a TCP session is dropped or terminated without RST or FIN
> packets, the operating system it left to eventually time out the session. On
> the linux kernel this timeout usually takes 15 to 30minutes.
> For this entire period, the AMQ server hangs, and producers and consumers are
> unable to use a topic.
> I have created two patches for this at the page:
> http://www.hanik.com/covalent/amq/index.html
> Let me show a bit more
> ---------------------------------
> "ActiveMQ Transport: tcp:///X.YYY.XXX.ZZZZ:2011" daemon prio=10
> tid=0x0000000055d39000 nid=0xc78 runnable
> [0x00000000447c9000..0x00000000447cac10]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> This is a thread stuck in blocking IO, and can be stuck for 30 minutes during
> the kernel TCP retransmission attempts.
> Unfortunately the thread dump is very misleading since the name of the
> thread, is not the destination or even remotely related to the socket it is
> operating on.
> To mend this, a very simple (and configurable) ThreadNameFilter has been
> suggested to the patch, that appends the destination and helps the system
> administrator correctly identify the client that is about to receive data.
> -----------------------------------
> at org.apache.activemq.broker.region.Topic.dispatch(Topic.java:581)
> at org.apache.activemq.broker.region.Topic.doMessageSend(Topic.java:421)
> - locked <0x00002aaaec155818> (a
> org.apache.activemq.broker.region.Topic)
> at org.apache.activemq.broker.region.Topic.send(Topic.java:363)
> The lock being held at this issue unfortunately makes the entire Topic single
> threaded.
> When this lock is being held, no other clients (producers and consumers) can
> publish to/receive from this topic.
> And this lock can hold up to 30 minutes.
> I consider solving this single threaded behavior a 'feature enhancement' that
> should be handled separately from this bug. Because even if it is solved,
> threads still risk being stuck in socketWrite0 for dropped connections that
> still appear to be established.
> For this, I have implemented a socket timeout filter, based on a
> TransportFilter, this filter only times out connections that are actually
> writing data.
> The two patches are at:
> http://www.hanik.com/covalent/amq/patch-1-threadname-filter.patch
> http://www.hanik.com/covalent/amq/patch-3-tcp-writetimeout.patch
> the binary 0000.jar applies to both 5.1 and trunk and can be used today in
> existing environments.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.