[
https://issues.apache.org/activemq/browse/AMQ-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Gellings updated AMQ-2627:
-------------------------------
Attachment: activemq.xml
NativeNMSConsumerAndProducer.zip
Attached is a console app using NMS to replicate the problem along with our
activemq.xml.
zip file is password protected with password "fridaytest".
We're using ActiveMQ v5.2, jdbc master/slave MSSQL 2008. Attached is an NMS
v1.2 RC4 consumer with a transacted session as well as activemq.xml.
To replicate:
1) Work through the console prompts and produce 50 msgs.
2) Restart console and start consuming those 50 msgs.
3) In the middle of the consumer processing, restart broker
4) The last message consumer was processing will be resent and not marked
as redelivered. (this is the idempotent msg problem. Ex. -- instead of $500
getting deposited into your account, $1000 does)
5) Then NMS blows up which seems like a different problem?
>From what I understand this shouldn't be the case if you use a transacted
>session, however the attached console app can prove it is a problem.
Bottomline--I thought this was why the camel idempotent consumer pattern [1]
existed which can be leveraged by java clients.
[1] http://fusesource.com/docs/router/1.6/eip/MsgEnd-Idempotent.html
Regards,
Mark
> Failover causes duplicate messages
> ----------------------------------
>
> Key: AMQ-2627
> URL: https://issues.apache.org/activemq/browse/AMQ-2627
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.3.0
> Environment: Server: 2 RHEL 5.3 x86-64 machines. Kernel version
> 2.6.18-128.0.0.0.2.el5.
> Client: Same as above. Also tested with same results on Fedora Core 11
> Reporter: Josh Carlson
> Priority: Blocker
> Attachments: activemq.xml, broken_failover.tar.bz2,
> NativeNMSConsumerAndProducer.zip
>
>
> When using a shared file system master/server activemq configuration and
> client acknoledgements we run into a problem when
> our clients fail over to a new server. The problem is that the new server
> does not appear to have any knowledge of pending
> messages that the old server had dispatched to clients. Consequently all of
> these pending messages get dispatched a second
> time even though the clients had acknowledged them.
> Please confirm my suspicion that this is a server side bug and if there are
> any suggestions for working around this issue so that it might work. I have
> put this at Priority 'Blocker' because it blocks our progress towards
> deploying an ActiveMQ solution to our infrastructure.
> If you look at the log file from the new broker you can see that the ack for
> those messages do not get matched:
> 2010-02-24 12:46:49,759 | WARN | Async error occurred:
> javax.jms.JMSException: Unmatched acknowledege:
> I do not know whether this gets bubbled up to the client or not. If it does
> it must be under the hood in activemq-cpp
> because from the application layer I do not see any errors. In our in house
> Perl Stomp client we wind up getting an ERROR
> frame which it did not know what to do with. This is where I intially ran
> into this problem. Today is my first day using
> CMS to attempt to verify if the bug is independent of the client and to
> provide a reproducer using a client everyone
> should have ready access to.
> The attached tar file will contain the following details for reproducing this
> problem.
> Contents:
> README.txt - This File
> activemq_1.xml - ActiveMQ config for the server that was
> master at the time I started the consumer
> activemq_2.xml - ActiveMQ config for the broker which became
> the master after the original master failed
> activemq_1.log - Log file from the first server
> activemq_2.log - Log for the second server
> producers/SimpleProducer.cpp - Modified version of program shipped in
> activemq-cpp-library-3.1.0 to
> send only 2 messages and provide two broker
> hosts on the command line.
> consumers/SimpleConsumer.cpp - New file ... but really just a modified
> version of SimpleAsyncConsumer shipped with
> activemq-cpp-library-3.1.0. Modified as
> follows:
> - Retrieves messages synchronously and
> in one thread (so we can see what is going on)
> - Takes two command line options to name
> broker hosts to use in broker URI
> - Uses Client Acknoledgements.
> - After retrieving a message it blocks
> waiting for standard input (so one has time to go kill the server)
> Makefile.am - Modified version of the makefile to build
> the new SimpleConsumer program.
>
>
> Note that the build for these files require that they be built from inside a
> activemq-cpp build tree. So the first step to reproduce this problem would be
> to copy producers/SimpleProducer.cpp consumers/SimpleConsumer.cpp and
> Makefile.am to your src/examples directory. Then run a top level, configure
> and make. I ran this using activemq-cpp-library version 3.1.0
>
> This reproducer expects that you only have 2 activemq brokers and that they
> be configured using a shared file system master/slave configuration. It also
> expects an openwire transport connector listening on port 61616 on those two
> machines. (Note: you'll see my activemq configs using the transport uri:
> uri="tcp://q1masterhost:61616", q1masterhost goes to the ethernet 0 interface
> on each of the hosts.)
> Once you have those two brokers set up and running. Go ahead and run the
> simple_producer code passing the hostnames of your two brokers on the command
> line:
> [jcarl...@rocky examples]$ ./simple_producer mmq1 mmq2
> =====================================================
> Starting the example:
> -----------------------------------------------------
> Sent message #1 from thread 139817389041504
> Sent message #2 from thread 139817389041504
> -----------------------------------------------------
> Finished with the example.
> =====================================================
> Now do the same for the simple_consumer:
> [jcarl...@rocky examples]$ ./simple_consumer mmq1 mmq2
> =====================================================
> Starting the example:
> -----------------------------------------------------
> Message #1 Received: Hello world! from thread 139817389041504
> Waiting for stdin to acknoledge
> The app has retrieved one message but has not ack'ed it yet. Now go identify
> which host has the master broker and kill the process. The master broker will
> be the one which is *not* printing 'Database [lockfile] is locked' messages.
> In my case the broker was on mmq1 so I did this in another terminal:
> ssh -t mmq1 sudo pkill java
> Immediatly I see this in the console I started the consumer in:
> The Connection's Transport has been Interrupted.
> and then a few seconds later I see:
> The Connection's Transport has been Restored.
> At this point I hit enter in the terminal so that the message I recieved on
> the other broker gets acknoledged and the consumer trys to get another message
> Message #2 Received: Hello world! from thread 139817389041504
> Waiting for stdin to acknoledge
> Ok at this point, since I have only put two messages on the queue I don't
> expect any more so when I hit enter and go back to get another message I
> expect it to just sit and wait for another message to come in. This is not
> what happens. A third message is retrieved:
> Message #3 Received: Hello world! from thread 139817389041504
> Waiting for stdin to acknoledge
> At this point when I hit enter again the app blocks and I kill it with Cntrl
> C.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.