[
https://issues.apache.org/jira/browse/AMQ-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298725#comment-14298725
]
Stefan Burkard commented on AMQ-5549:
-------------------------------------
Hi
The two cases are different, but I'm not sure if this makes a difference.
When, for whatever reason, the shared storage lock is released by the shared
storage provider (after the grace period), the slave can grab the lock, becomes
master and works as expected. This works fine in all cases I have seen.
The key question is how the former master behaves. In all descriptions about
two active master brokers, the problem arises because the former master
continues to do so. So the real problem is that the master does not recognize
that he has lost the lock. This can happen in all shared storage
configurations.
Jean-Baptiste Onofré describes the same problem with a shared JDBC datastore
here:
http://blog.nanthrax.net/2013/10/apache-activemq-5-7-5-9-and-master-slave.
In my tests with shared LevelDB-Store over NFS4 and hard mounts, the former
master broker NEVER realized that another broker has taken the lock. With a
soft mount it was realizing the new situation after a massive delay (about 5
minutes). With the NFS options Torsten mentioned the master broker realizes the
new situation after 15 to 30 seconds. I am not saying that these settings are
save for production, I just observed that changes to NFS settings lead to
completely different results.
So my guess is that the whole problem is just a matter of NFS settings. But
this part is never mentioned in example setups etc. Is anybody out there who
has a working and tested shared-filesystem architecture with NFS? It would be
very helpful to have a "reference configuration" that works.
> Shared Filesystem Master/Slave using NFSv4 allows both brokers become active
> at the same time
> ---------------------------------------------------------------------------------------------
>
> Key: AMQ-5549
> URL: https://issues.apache.org/jira/browse/AMQ-5549
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker, Message Store
> Affects Versions: 5.10.1
> Environment: - CentOS Linux 6
> - OpenJDK 1.7
> - ActiveMQ 5.10.1
> Reporter: Heikki Manninen
> Priority: Critical
>
> Identical ActiveMQ master and slave brokers are installed on CentOS Linux 6
> virtual machines. There is a third virtual machine (also CentOS 6) providing
> an NFSv4 share for the brokers KahaDB.
> Both brokers are started and the master broker acquires file lock on the lock
> file and the slave broker sits in a loop and waits for a lock as expected.
> Also changing brokers work as expected.
> Once the network connection of the NFS server is disconnected both master and
> slave NFS mounts block and slave broker stops logging file lock re-tries.
> After a short while after bringing the network connection back the mounts
> come back and the slave broker is able to acquire the lock simultaneously.
> Both brokers accept client connections.
> In this situation it is also possible to stop and start both individual
> brokers many times and they are always able to acquire the lock even if the
> other one is already running. Only after stopping both brokers and starting
> them again is the situation back to normal.
> * NFS server:
> ** CentOS Linux 6
> ** NFS v4 export options: rw,sync
> ** NFS v4 grace time 45 seconds
> ** NFS v4 lease time 10 seconds
> * NFS client:
> ** CentOS Linux 6
> ** NFS mount options: nfsvers=4,proto=tcp,hard,wsize=65536,rsize=65536
> * ActiveMQ configuration (otherwise default):
> {code:xml}
> <persistenceAdapter>
> <kahaDB directory="${activemq.data}/kahadb">
> <locker>
> <shared-file-locker lockAcquireSleepInterval="1000"/>
> </locker>
> </kahaDB>
> </persistenceAdapter>
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)