[ https://issues.apache.org/activemq/browse/AMQ-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mario Siegenthaler updated AMQ-1925: ------------------------------------ Attachment: AMQ1925Test.java TestCase showing the problem with lost messages. The test testAMQ1925_TXInProgress should fail, the other two run fine. Note: The test does not always fail, only when it gets interrupted during session.commit. This is the slowest call, so it mostly happens, but there's no guarantee. > JDBC-Master/Slave Failover - Consumer stop after 1000 Messages > -------------------------------------------------------------- > > Key: AMQ-1925 > URL: https://issues.apache.org/activemq/browse/AMQ-1925 > Project: ActiveMQ > Issue Type: Bug > Components: Broker > Affects Versions: 5.1.0 > Reporter: Mario Siegenthaler > Fix For: 5.3.0 > > Attachments: AMQ1925Test.java, heapdump-1220373534484.hprof, > threaddump-1220371256910.tdump > > > In a JDBC-Master/Slave Environment with ActiveMQ 5.1.0 (+patches for 1710 und > 1838) the failover for consumers works, the consumers resume to get messages > after the failover but then the suddenly stop after approx. 1000 messages > (mostly 1000, one got to 1080). The consumers are using transacted sessions. > The thread dump look unsuspicious, everybody is waiting on the Socket > java.lang.Thread.State: RUNNABLE > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at > org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50) > at > org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58) > at java.io.DataInputStream.readInt(DataInputStream.java:370) > at > org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269) > at > org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:203) > at > org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:195) > at > org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:183) > at java.lang.Thread.run(Thread.java:619) > A memory dump from the consumers shows that they've really run out of > messages and are waiting for the broker to deliver new ones. I've attached > both the thread dump and the heap dump to this issue (or better: I'll do so :) > The broker doesn't do anything (also waits on the transport-socket), the > queue has a full page-in buffer (100 messages) but obviously fails to do > anything with it. If I manually trigger a doDispatch of all pagedIn messages > (via the debugger, just a try to revive the thing) it returns doing nothing > at all, since all subscriptions are full (s.isFull). I further investigated > the issue and was confused to see the prefetchExtension field of the > PrefetchSubscription having a value of -1000 (negative!). This explains why > it was considered full: > dispatched.size() - prefetchExtension >= info.getPrefetchSize() > 0 - (-1000) >= 1000 > quite nasty.. so even though the dispatched size was zero the client didn't > receive any new messages. > The only place this value can become negative is inside acknowledge, where > it's decremented (prefetchExtension--), all other places do a Math.max(0, X). > So here's my guess what happened: The client had a full (1000 messages) > prefetch buffer when I killed my master. As soon as the slave was done > starting they reconnected and started processing the messages in the prefetch > and acknowleding them. This gradually decremented the counter into a negative > value because the slave never got a chance to increment the prefetchExtension > since it didn't action delivery those messages. > Possible solutions: > - clear the prefetch buffer on a failover > - just don't allow this value to become smaller than zero (not sure if that > covers all bases) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.