[jira] Updated: (AMQ-1925) JDBC-Master/Slave Failover - Consumer stop after 1000 Messages

Mario Siegenthaler (JIRA) Wed, 10 Sep 2008 07:29:43 -0700

     [ 
https://issues.apache.org/activemq/browse/AMQ-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mario Siegenthaler updated AMQ-1925:
------------------------------------

    Attachment: AMQ1925Test.java

TestCase showing the problem with lost messages.
The test testAMQ1925_TXInProgress should fail, the other two run fine.

Note: The test does not always fail, only when it gets interrupted during 
session.commit. This is the slowest call, so it mostly happens, but there's no 
guarantee.

> JDBC-Master/Slave Failover - Consumer stop after 1000 Messages
> --------------------------------------------------------------
>
>                 Key: AMQ-1925
>                 URL: https://issues.apache.org/activemq/browse/AMQ-1925
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.1.0
>            Reporter: Mario Siegenthaler
>             Fix For: 5.3.0
>
>         Attachments: AMQ1925Test.java, heapdump-1220373534484.hprof, 
> threaddump-1220371256910.tdump
>
>
> In a JDBC-Master/Slave Environment with ActiveMQ 5.1.0 (+patches for 1710 und 
> 1838) the failover for consumers works, the consumers resume to get messages 
> after the failover but then the suddenly stop after approx. 1000 messages 
> (mostly 1000, one got to 1080). The consumers are using transacted sessions.
> The thread dump look unsuspicious, everybody is waiting on the Socket
>    java.lang.Thread.State: RUNNABLE
>       at java.net.SocketInputStream.socketRead0(Native Method)
>       at java.net.SocketInputStream.read(SocketInputStream.java:129)
>       at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50)
>       at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58)
>       at java.io.DataInputStream.readInt(DataInputStream.java:370)
>       at 
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>       at 
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:203)
>       at 
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:195)
>       at 
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:183)
>       at java.lang.Thread.run(Thread.java:619)
> A memory dump from the consumers shows that they've really run out of 
> messages and are waiting for the broker to deliver new ones. I've attached 
> both the thread dump and the heap dump to this issue (or better: I'll do so :)
> The broker doesn't do anything (also waits on the transport-socket), the 
> queue has a full page-in buffer (100 messages) but obviously fails to do 
> anything with it. If I manually trigger a doDispatch of all pagedIn messages 
> (via the debugger, just a try to revive the thing) it returns doing nothing 
> at all, since all subscriptions are full (s.isFull). I further investigated 
> the issue and was confused to see the prefetchExtension field of the 
> PrefetchSubscription having a value of -1000 (negative!). This explains why 
> it was considered full:
>   dispatched.size() - prefetchExtension >= info.getPrefetchSize()
>   0 - (-1000) >= 1000
> quite nasty.. so even though the dispatched size was zero the client didn't 
> receive any new messages.
> The only place this value can become negative is inside acknowledge, where 
> it's decremented (prefetchExtension--), all other places do a Math.max(0, X).
> So here's my guess what happened: The client had a full (1000 messages) 
> prefetch buffer when I killed my master. As soon as the slave was done 
> starting they reconnected and started processing the messages in the prefetch 
> and acknowleding them. This gradually decremented the counter into a negative 
> value because the slave never got a chance to increment the prefetchExtension 
> since it didn't action delivery those messages.
> Possible solutions:
> - clear the prefetch buffer on a failover
> - just don't allow this value to become smaller than zero (not sure if that 
> covers all bases)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AMQ-1925) JDBC-Master/Slave Failover - Consumer stop after 1000 Messages

Reply via email to