[
https://issues.apache.org/activemq/browse/AMQ-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=47008#action_47008
]
Filip Hanik commented on AMQ-1993:
----------------------------------
Here is another example of a thread locking up the entire system, based on the
same scenario.
{code}
"BrokerService" daemon prio=10 tid=0x0000000060103800 nid=0x74e7 runnable
[0x00000000467c7000..0x00000000467c7c10]
java.lang.Thread.State: RUNNABLE
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:106)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at
org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:165)
at
org.apache.activemq.transport.InactivityMonitor.oneway(InactivityMonitor.java:233)
- locked <0x00002aaabe89c2b0> (a
java.util.concurrent.atomic.AtomicBoolean)
at
org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:83)
at
org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:100)
at
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:40)
- locked <0x00002aaabe89cc10> (a java.lang.Object)
at
org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1188)
at
org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:776)
at
org.apache.activemq.broker.TransportConnection.iterate(TransportConnection.java:813)
at
org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:122)
at
org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:43)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
{code}
and as a result other threads are blocked
{code}
"ActiveMQ Transport Stopper: /74.33.248.245:61489" daemon prio=10
tid=0x00000000607ad400 nid=0x7687 waiting for monitor entry
[0x00000000450b0000..0x00000000450b0c90]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:40)
- waiting to lock <0x00002aaabe89cc10> (a java.lang.Object)
at
org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1188)
at
org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:776)
at
org.apache.activemq.broker.TransportConnection.dispatchSync(TransportConnection.java:735)
{code}
> Systems hang due to inability to timeout socket write operation
> ---------------------------------------------------------------
>
> Key: AMQ-1993
> URL: https://issues.apache.org/activemq/browse/AMQ-1993
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.1.0, 5.2.0
> Environment: Unix (Solaris and Linux tested)
> Reporter: Filip Hanik
> Priority: Critical
> Attachments: patch-1-threadname-filter.patch,
> patch-3-tcp-writetimeout.patch
>
>
> the blocking Java Socket API doesn't have a timeout on socketWrite
> invocations.
> This means, if a TCP session is dropped or terminated without RST or FIN
> packets, the operating system it left to eventually time out the session. On
> the linux kernel this timeout usually takes 15 to 30minutes.
> For this entire period, the AMQ server hangs, and producers and consumers are
> unable to use a topic.
> I have created two patches for this at the page:
> http://www.hanik.com/covalent/amq/index.html
> Let me show a bit more
> ---------------------------------
> "ActiveMQ Transport: tcp:///X.YYY.XXX.ZZZZ:2011" daemon prio=10
> tid=0x0000000055d39000 nid=0xc78 runnable
> [0x00000000447c9000..0x00000000447cac10]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketOutputStream.socketWrite0(Native Method)
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> This is a thread stuck in blocking IO, and can be stuck for 30 minutes during
> the kernel TCP retransmission attempts.
> Unfortunately the thread dump is very misleading since the name of the
> thread, is not the destination or even remotely related to the socket it is
> operating on.
> To mend this, a very simple (and configurable) ThreadNameFilter has been
> suggested to the patch, that appends the destination and helps the system
> administrator correctly identify the client that is about to receive data.
> -----------------------------------
> at org.apache.activemq.broker.region.Topic.dispatch(Topic.java:581)
> at org.apache.activemq.broker.region.Topic.doMessageSend(Topic.java:421)
> - locked <0x00002aaaec155818> (a
> org.apache.activemq.broker.region.Topic)
> at org.apache.activemq.broker.region.Topic.send(Topic.java:363)
> The lock being held at this issue unfortunately makes the entire Topic single
> threaded.
> When this lock is being held, no other clients (producers and consumers) can
> publish to/receive from this topic.
> And this lock can hold up to 30 minutes.
> I consider solving this single threaded behavior a 'feature enhancement' that
> should be handled separately from this bug. Because even if it is solved,
> threads still risk being stuck in socketWrite0 for dropped connections that
> still appear to be established.
> For this, I have implemented a socket timeout filter, based on a
> TransportFilter, this filter only times out connections that are actually
> writing data.
> The two patches are at:
> http://www.hanik.com/covalent/amq/patch-1-threadname-filter.patch
> http://www.hanik.com/covalent/amq/patch-3-tcp-writetimeout.patch
> the binary 0000.jar applies to both 5.1 and trunk and can be used today in
> existing environments.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.