Hi,

I'm performing downtime measurement tests using corosync version 2.3.0 and 
pacemaker version 1.1.12  under RHEL 6.5 MRG and although not recommended, I 
tuned the corosync configuration settings to following insane values:

        # Timeout for token
        token: 60
        token_retransmits_before_loss_const: 1

        # How long to wait for join messages in the membership protocol (ms)
        join: 35
        consensus: 70

My two node cluster consists of a kamailio clone resource, which replicates the 
so called userlocation state using DMQ on application level (see [1]). The 
switchover performs the migration of a ocf:heartbeat:IPaddr2 resource. With 
these settings, the service downtime is lower 100ms in case of a controlled 
cluster switchover, when "/etc/init.d/pacemaker stop" and "/etc/init.d/corosync 
stop" get executed. 

The service downtime is about 400ms when the power loss is simulated on the 
active node, which does not execute the DC task. When I simulate power loss on 
the active node, which is active and executes the DC task, the service downtime 
increases to about 1500ms. As the timestamps in the logs are on second 
resolution only, it is hard to provide more detailed numbers, but apparently 
the DC election procedure takes more than 1000ms.

Are there any possibilities to tune the DC election process? Is there 
documentation available what is happening in this situation?

Tests with more nodes in the cluster showed that the service downtime increases 
with the number of online cluster nodes, even if the DC is executed on one of 
the nodes, which remain active. 

I'm using one ring only. It looks as the usage of two rings do not change the 
test results a lot.

Thank you,

Stefan

[1] http://kamailio.org/docs/modules/devel/modules/dmq.html

_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss

Reply via email to