[ 
https://issues.apache.org/jira/browse/AMQ-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297151#comment-14297151
 ] 

Heikki Manninen edited comment on AMQ-5549 at 1/29/15 5:16 PM:
---------------------------------------------------------------

Hi, thanks for the comments.

In later testing, different configurations were tested and finally the settings 
were:

{code:xml}
<kahaDB directory="${activemq.data}/kahadb" lockKeepAlivePeriod="5000">
  <locker>
    <shared-file-locker lockAcquireSleepInterval="15000"/>
  </locker>
</kahaDB>
{code}

as mentioned in the last comment. This setup combined with NFSv4 lease timeout 
of 60 seconds yielded the best results although still failing when the network 
outage was long enough for the NFS filesystem to block for a long time (even 
after the connection was restored).

Mount options are there in the issue description but I re-visited them for 
further testing and finally used these (trying to increase the timeout value to 
maximum):

rw,nfsvers=4,proto=tcp,timeo=6000,retrans=3,hard,wsize=65536,rsize=65536

Although this didn't make much difference compared to the default timeout of 
600 deciseconds.

I will give restartAllowed a try.

Furthermore I tested the exact same setup with a NetApp filer NFSv4 server with 
the exact same settings and brokers behaved as they should have. Master was 
able to continue operating and slave did not get the lock after 5, 30, 45 or 
180 second outages.



was (Author: heikki_m):
Hi, thanks for the comments.

In later testing, different configurations were tested and finally the settings 
were:

{code:xml}
<kahaDB directory="${activemq.data}/kahadb" lockKeepAlivePeriod="5000">
  <locker>
    <shared-file-locker lockAcquireSleepInterval="15000"/>
  </locker>
</kahaDB>
{code}

as mentioned in the last comment. This setup combined with NFSv4 lease timeout 
of 30 seconds yielded the best results although still failing when the network 
outage was long enough for the NFS filesystem to block for a long time (even 
after the connection was restored).

Mount options are there in the issue description but I re-visited them for 
further testing and finally used these (trying to increase the timeout value to 
maximum):

rw,nfsvers=4,proto=tcp,timeo=6000,retrans=3,hard,wsize=65536,rsize=65536

Although this didn't make much difference compared to the default timeout of 
600 deciseconds.

I will give restartAllowed a try.

Furthermore I tested the exact same setup with a NetApp filer NFSv4 server with 
the exact same settings and brokers behaved as they should have. Master was 
able to continue operating and slave did not get the lock after 5, 30, 45 or 
180 second outages.



> Shared Filesystem Master/Slave using NFSv4 allows both brokers become active 
> at the same time
> ---------------------------------------------------------------------------------------------
>
>                 Key: AMQ-5549
>                 URL: https://issues.apache.org/jira/browse/AMQ-5549
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker, Message Store
>    Affects Versions: 5.10.1
>         Environment: - CentOS Linux 6
> - OpenJDK 1.7
> - ActiveMQ 5.10.1
>            Reporter: Heikki Manninen
>            Priority: Critical
>
> Identical ActiveMQ master and slave brokers are installed on CentOS Linux 6 
> virtual machines. There is a third virtual machine (also CentOS 6) providing 
> an NFSv4 share for the brokers KahaDB.
> Both brokers are started and the master broker acquires file lock on the lock 
> file and the slave broker sits in a loop and waits for a lock as expected. 
> Also changing brokers work as expected.
> Once the network connection of the NFS server is disconnected both master and 
> slave NFS mounts block and slave broker stops logging file lock re-tries. 
> After a short while after bringing the network connection back the mounts 
> come back and the slave broker is able to acquire the lock simultaneously. 
> Both brokers accept client connections.
> In this situation it is also possible to stop and start both individual 
> brokers many times and they are always able to acquire the lock even if the 
> other one is already running. Only after stopping both brokers and starting 
> them again is the situation back to normal.
> * NFS server:
> ** CentOS Linux 6
> ** NFS v4 export options: rw,sync
> ** NFS v4 grace time 45 seconds
> ** NFS v4 lease time 10 seconds
> * NFS client:
> ** CentOS Linux 6
> ** NFS mount options: nfsvers=4,proto=tcp,hard,wsize=65536,rsize=65536
> * ActiveMQ configuration (otherwise default):
> {code:xml}
>         <persistenceAdapter>
>             <kahaDB directory="${activemq.data}/kahadb">
>               <locker>
>                 <shared-file-locker lockAcquireSleepInterval="1000"/>
>               </locker>
>             </kahaDB>
>         </persistenceAdapter>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to