[ 
https://issues.apache.org/activemq/browse/AMQ-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=47021#action_47021
 ] 

Gary Tully commented on AMQ-1993:
---------------------------------

Am not sure it is safer because the filter introduces a change of behaviour to 
the normal exception case. Ie: onException is now always called.
In addition, in the event that a close is done async from an onException, there 
is still an opportunity to have a normal IOException interleaved with a Forced 
exception.
I think this is the same as with a pass through on exception, a close can get 
called twice, but this is handled ok by close.
Mostly though, I am wary of the change in behaviour introduced by the exception 
handler.
As this is a filter that is added by choice it is not such a big deal but we 
may as well iron out the detail. This is a handy feature. 

> Systems hang due to inability to timeout socket write operation
> ---------------------------------------------------------------
>
>                 Key: AMQ-1993
>                 URL: https://issues.apache.org/activemq/browse/AMQ-1993
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.1.0, 5.2.0
>         Environment: Unix (Solaris and Linux tested)
>            Reporter: Filip Hanik
>            Assignee: Gary Tully
>            Priority: Critical
>         Attachments: patch-1-threadname-filter.patch, 
> patch-3-tcp-writetimeout.patch
>
>
> the blocking Java Socket API doesn't have a timeout on socketWrite 
> invocations.
> This means, if a TCP session is dropped or terminated without RST or FIN 
> packets, the operating system it left to eventually time out the session. On 
> the linux kernel this timeout usually takes 15 to 30minutes. 
> For this entire period, the AMQ server hangs, and producers and consumers are 
> unable to use a topic.
> I have created two patches for this at the page:
> http://www.hanik.com/covalent/amq/index.html
> Let me show a bit more
> ---------------------------------
> "ActiveMQ Transport: tcp:///X.YYY.XXX.ZZZZ:2011" daemon prio=10 
> tid=0x0000000055d39000 nid=0xc78 runnable 
> [0x00000000447c9000..0x00000000447cac10]
>    java.lang.Thread.State: RUNNABLE
>       at java.net.SocketOutputStream.socketWrite0(Native Method)
>       at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
> This is a thread stuck in blocking IO, and can be stuck for 30 minutes during 
> the kernel TCP retransmission attempts.
> Unfortunately the thread dump is very misleading since the name of the 
> thread, is not the destination or even remotely related to the socket it is 
> operating on.
> To mend this, a very simple (and configurable) ThreadNameFilter has been 
> suggested to the patch, that appends the destination and helps the system 
> administrator correctly identify the client that is about to receive data. 
> -----------------------------------
>       at org.apache.activemq.broker.region.Topic.dispatch(Topic.java:581)
>       at org.apache.activemq.broker.region.Topic.doMessageSend(Topic.java:421)
>       - locked <0x00002aaaec155818> (a 
> org.apache.activemq.broker.region.Topic)
>       at org.apache.activemq.broker.region.Topic.send(Topic.java:363)
> The lock being held at this issue unfortunately makes the entire Topic single 
> threaded. 
> When this lock is being held, no other clients (producers and consumers) can 
> publish to/receive from this topic.
> And this lock can hold up to 30 minutes.
> I consider solving this single threaded behavior a 'feature enhancement' that 
> should be handled separately from this bug. Because even if it is solved, 
> threads still risk being stuck in socketWrite0 for dropped connections that 
> still appear to be established.
> For this, I have implemented a socket timeout filter, based on a 
> TransportFilter, this filter only times out connections that are actually 
> writing data.
> The two patches are at:
> http://www.hanik.com/covalent/amq/patch-1-threadname-filter.patch
> http://www.hanik.com/covalent/amq/patch-3-tcp-writetimeout.patch
> the binary 0000.jar applies to both 5.1 and trunk and can be used today in 
> existing environments. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to