Hey folks,

I thought I'd make a summary of where I'm at. Here are the issues I found
and what I did about it:

- We ran into an Ansible issue that the PR
https://github.com/ansible/ansible/pull/50381 fixes. I've asked pingou to
patch batcave since it's basically a one-liner that will keep working with
the older prod version.

- When starting a RabbitMQ cluster from scratch, there is a race condition
that is documented here:
https://www.rabbitmq.com/cluster-formation.html#initial-formation-race-condition
  On nodes 02 and 03, I've just destroyed the database and let it
auto-detect the cluster again
  # systemctl stop rabbitmq-server && rm -rf /var/lib/rabbitmq/mnesia/ &&
systemctl start rabbitmq-server
  It worked fine. I checked with "rabbitmqctl list_users" that all nodes
had the same users declared.

- I've also fixed a couple things in the playbooks that assumed the cluster
to be up and setup already.

- I've rebuilt collectd-rabbitmq for EPEL8 but we currently only install it
on production apparently (not sure why, I think it could be useful in
staging.

- The nagios-plugins-rabbitmq RPM still fails to install because of a
dependency bug in perl-Monitoring-Plugin, I've opened a ticket about it:
https://bugzilla.redhat.com/show_bug.cgi?id=1803121

Now, we need to recreate the queues, users and bindings, and I don't have
the permissions to run all the playbooks. If someone could run the master
playbook limited on staging and on the rabbitmq_cluster tag, I think it
should recreate all users and queues and we should be all set.

I'm around and on IRC if you need me.

Aurélien
_______________________________________________
infrastructure mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/[email protected]

Reply via email to