[ 
https://issues.apache.org/activemq/browse/AMQ-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=47008#action_47008
 ] 

fhanik edited comment on AMQ-1993 at 11/3/08 6:12 AM:
-----------------------------------------------------------

Here is another example of a thread locking up the entire system, based on the 
same scenario.

{code}
"BrokerService" daemon prio=10 tid=0x0000000060103800 nid=0x74e7 runnable 
[0x00000000467c7000..0x00000000467c7c10]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at 
org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:106)
        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
        at 
org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:165)
        at 
org.apache.activemq.transport.InactivityMonitor.oneway(InactivityMonitor.java:233)
        - locked <0x00002aaabe89c2b0> (a 
java.util.concurrent.atomic.AtomicBoolean)
        at 
org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:83)
        at 
org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:100)
        at 
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:40)
        - locked <0x00002aaabe89cc10> (a java.lang.Object)
        at 
org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1188)
        at 
org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:776)
        at 
org.apache.activemq.broker.TransportConnection.iterate(TransportConnection.java:813)
        at 
org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:122)
        at 
org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:43)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
{code}

and as a result other threads are blocked
{code}
"ActiveMQ Transport Stopper: /xx.xx.xxx.xxx:61489" daemon prio=10 
tid=0x00000000607ad400 nid=0x7687 waiting for monitor entry 
[0x00000000450b0000..0x00000000450b0c90]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:40)
        - waiting to lock <0x00002aaabe89cc10> (a java.lang.Object)
        at 
org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1188)
        at 
org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:776)
        at 
org.apache.activemq.broker.TransportConnection.dispatchSync(TransportConnection.java:735)
{code}

      was (Author: fhanik):
    Here is another example of a thread locking up the entire system, based on 
the same scenario.

{code}
"BrokerService" daemon prio=10 tid=0x0000000060103800 nid=0x74e7 runnable 
[0x00000000467c7000..0x00000000467c7c10]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at 
org.apache.activemq.transport.tcp.TcpBufferedOutputStream.flush(TcpBufferedOutputStream.java:106)
        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
        at 
org.apache.activemq.transport.tcp.TcpTransport.oneway(TcpTransport.java:165)
        at 
org.apache.activemq.transport.InactivityMonitor.oneway(InactivityMonitor.java:233)
        - locked <0x00002aaabe89c2b0> (a 
java.util.concurrent.atomic.AtomicBoolean)
        at 
org.apache.activemq.transport.TransportFilter.oneway(TransportFilter.java:83)
        at 
org.apache.activemq.transport.WireFormatNegotiator.oneway(WireFormatNegotiator.java:100)
        at 
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:40)
        - locked <0x00002aaabe89cc10> (a java.lang.Object)
        at 
org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1188)
        at 
org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:776)
        at 
org.apache.activemq.broker.TransportConnection.iterate(TransportConnection.java:813)
        at 
org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:122)
        at 
org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:43)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
{code}

and as a result other threads are blocked
{code}
"ActiveMQ Transport Stopper: /74.33.248.245:61489" daemon prio=10 
tid=0x00000000607ad400 nid=0x7687 waiting for monitor entry 
[0x00000000450b0000..0x00000000450b0c90]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at 
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:40)
        - waiting to lock <0x00002aaabe89cc10> (a java.lang.Object)
        at 
org.apache.activemq.broker.TransportConnection.dispatch(TransportConnection.java:1188)
        at 
org.apache.activemq.broker.TransportConnection.processDispatch(TransportConnection.java:776)
        at 
org.apache.activemq.broker.TransportConnection.dispatchSync(TransportConnection.java:735)
{code}
  
> Systems hang due to inability to timeout socket write operation
> ---------------------------------------------------------------
>
>                 Key: AMQ-1993
>                 URL: https://issues.apache.org/activemq/browse/AMQ-1993
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.1.0, 5.2.0
>         Environment: Unix (Solaris and Linux tested)
>            Reporter: Filip Hanik
>            Priority: Critical
>         Attachments: patch-1-threadname-filter.patch, 
> patch-3-tcp-writetimeout.patch
>
>
> the blocking Java Socket API doesn't have a timeout on socketWrite 
> invocations.
> This means, if a TCP session is dropped or terminated without RST or FIN 
> packets, the operating system it left to eventually time out the session. On 
> the linux kernel this timeout usually takes 15 to 30minutes. 
> For this entire period, the AMQ server hangs, and producers and consumers are 
> unable to use a topic.
> I have created two patches for this at the page:
> http://www.hanik.com/covalent/amq/index.html
> Let me show a bit more
> ---------------------------------
> "ActiveMQ Transport: tcp:///X.YYY.XXX.ZZZZ:2011" daemon prio=10 
> tid=0x0000000055d39000 nid=0xc78 runnable 
> [0x00000000447c9000..0x00000000447cac10]
>    java.lang.Thread.State: RUNNABLE
>       at java.net.SocketOutputStream.socketWrite0(Native Method)
>       at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> This is a thread stuck in blocking IO, and can be stuck for 30 minutes during 
> the kernel TCP retransmission attempts.
> Unfortunately the thread dump is very misleading since the name of the 
> thread, is not the destination or even remotely related to the socket it is 
> operating on.
> To mend this, a very simple (and configurable) ThreadNameFilter has been 
> suggested to the patch, that appends the destination and helps the system 
> administrator correctly identify the client that is about to receive data. 
> -----------------------------------
>       at org.apache.activemq.broker.region.Topic.dispatch(Topic.java:581)
>       at org.apache.activemq.broker.region.Topic.doMessageSend(Topic.java:421)
>       - locked <0x00002aaaec155818> (a 
> org.apache.activemq.broker.region.Topic)
>       at org.apache.activemq.broker.region.Topic.send(Topic.java:363)
> The lock being held at this issue unfortunately makes the entire Topic single 
> threaded. 
> When this lock is being held, no other clients (producers and consumers) can 
> publish to/receive from this topic.
> And this lock can hold up to 30 minutes.
> I consider solving this single threaded behavior a 'feature enhancement' that 
> should be handled separately from this bug. Because even if it is solved, 
> threads still risk being stuck in socketWrite0 for dropped connections that 
> still appear to be established.
> For this, I have implemented a socket timeout filter, based on a 
> TransportFilter, this filter only times out connections that are actually 
> writing data.
> The two patches are at:
> http://www.hanik.com/covalent/amq/patch-1-threadname-filter.patch
> http://www.hanik.com/covalent/amq/patch-3-tcp-writetimeout.patch
> the binary 0000.jar applies to both 5.1 and trunk and can be used today in 
> existing environments. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to