JDBC-Master/Slave Failover - Consumer stop after 1000 Messages
--------------------------------------------------------------

                 Key: AMQ-1925
                 URL: https://issues.apache.org/activemq/browse/AMQ-1925
             Project: ActiveMQ
          Issue Type: Bug
          Components: Broker
    Affects Versions: 5.1.0
            Reporter: Mario Siegenthaler
         Attachments: heapdump-1220373534484.hprof, 
threaddump-1220371256910.tdump

In a JDBC-Master/Slave Environment with ActiveMQ 5.1.0 (+patches for 1710 und 
1838) the failover for consumers works, the consumers resume to get messages 
after the failover but then the suddenly stop after approx. 1000 messages 
(mostly 1000, one got to 1080). The consumers are using transacted sessions.

The thread dump look unsuspicious, everybody is waiting on the Socket
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at 
org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50)
        at 
org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58)
        at java.io.DataInputStream.readInt(DataInputStream.java:370)
        at 
org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
        at 
org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:203)
        at 
org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:195)
        at 
org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:183)
        at java.lang.Thread.run(Thread.java:619)

A memory dump from the consumers shows that they've really run out of messages 
and are waiting for the broker to deliver new ones. I've attached both the 
thread dump and the heap dump to this issue (or better: I'll do so :)

The broker doesn't do anything (also waits on the transport-socket), the queue 
has a full page-in buffer (100 messages) but obviously fails to do anything 
with it. If I manually trigger a doDispatch of all pagedIn messages (via the 
debugger, just a try to revive the thing) it returns doing nothing at all, 
since all subscriptions are full (s.isFull). I further investigated the issue 
and was confused to see the prefetchExtension field of the PrefetchSubscription 
having a value of -1000 (negative!). This explains why it was considered full:
  dispatched.size() - prefetchExtension >= info.getPrefetchSize()
  0 - (-1000) >= 1000
quite nasty.. so even though the dispatched size was zero the client didn't 
receive any new messages.
The only place this value can become negative is inside acknowledge, where it's 
decremented (prefetchExtension--), all other places do a Math.max(0, X).

So here's my guess what happened: The client had a full (1000 messages) 
prefetch buffer when I killed my master. As soon as the slave was done starting 
they reconnected and started processing the messages in the prefetch and 
acknowleding them. This gradually decremented the counter into a negative value 
because the slave never got a chance to increment the prefetchExtension since 
it didn't action delivery those messages.

Possible solutions:
- clear the prefetch buffer on a failover
- just don't allow this value to become smaller than zero (not sure if that 
covers all bases)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to