[Bug 1138213] New: Rabbitmq cluster remains partitioned after short network partition incident

bugzilla Thu, 04 Sep 2014 02:47:20 -0700

https://bugzilla.redhat.com/show_bug.cgi?id=1138213


            Bug ID: 1138213
           Summary: Rabbitmq cluster remains partitioned after short
                    network partition incident
           Product: Fedora
           Version: 20
         Component: rabbitmq-server
          Assignee: [email protected]
          Reporter: [email protected]
        QA Contact: [email protected]
                CC: [email protected],
                    [email protected], [email protected],
                    [email protected], [email protected], [email protected]



Description of problem:
In 3-node rabbitmq cluster after network partition happens and then when 3
nodes can communicate again, then cluster remains partitioned if
"pause_minority" policy is used.

This happens if network outage is short enough (~60secs), for longer outages
(>3 minutes) cluster was reconstructed properly. Very similar issue was
reported here:
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2014-March/034639.html

This issue seems to be fixed in upstream already (not sure in which version
exactly, but 3.3.5 works fine).

Version-Release number of selected component (if applicable):
rabbitmq-server-3.1.5-9.fc20.noarch

Steps to Reproduce:
1. create 3-node cluster, use "pause_minority" no-quorum policy
2. stop networking on one of nodes and wait until the node is seen as shutdown
on other nodes
3. start networking on the node again

Actual results:
Cluster remains partitioned:

[root@overcloud-controller1-csugetg5pjql ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@overcloud-controller1-csugetg5pjql' ...
[{nodes,
     [{disc,
          ['rabbit@overcloud-controller0-o6yt2gtaxk6g',
           'rabbit@overcloud-controller1-csugetg5pjql',
           'rabbit@overcloud-controller2-z3tswnamdzhq']}]},
 {running_nodes,
     ['rabbit@overcloud-controller0-o6yt2gtaxk6g',
      'rabbit@overcloud-controller1-csugetg5pjql']},
 {partitions,
     [{'rabbit@overcloud-controller0-o6yt2gtaxk6g',
          ['rabbit@overcloud-controller2-z3tswnamdzhq']},
      {'rabbit@overcloud-controller1-csugetg5pjql',
          ['rabbit@overcloud-controller2-z3tswnamdzhq']}]}]
...done.


Expected results:
no partitiones after the node is up again


Additional info:
only related message in log:
Mnesia('rabbit@overcloud-controller0-o6yt2gtaxk6g'): ** ERROR ** mnesia_event
got {inconsistent_database, running_partitioned_network,
'rabbit@overcloud-controller2-z3tswnamdzhq'}

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
erlang mailing list
[email protected]
https://lists.fedoraproject.org/mailman/listinfo/erlang

[Bug 1138213] New: Rabbitmq cluster remains partitioned after short network partition incident

Reply via email to