Have you thought about managing global cluster state with something that 
can keep track of node health? You could use Ansible to poll this before 
restarting individual nodes, waiting for all nodes to register as healthy, 
then acquiring a global lock before proceeding with the restart. With this 
approach, you can attempt restarts on all nodes in parallel, but be assured 
you will only be restarting one at a time (or as many locks as you 
configure). Each parallel Ansible run will just sit polling the agent until 
it gets a chance to acquire the lock.

I do not think using Ansible to track global cluster state will work out 
well, but Ansible can work really nicely in tandem with such a cluster 
monitor.

Consul <http://www.consul.io/> is a nice tool for this: a distributed CP 
service discovery, health checking, and key-value store system. It is quite 
general-purpose and does not require you to modify your services in any way 
to make Consul aware of them. Just hook the service registration into the 
service's systemd unit/init script, set up a health check that tells you 
when the node is well and truly a happy member of the cluster, and then 
poll Consul from anywhere to see what the overall cluster health is.

This doc discusses using Consul to build distributed locks:
http://www.consul.io/docs/internals/sessions.html

It's a very nice system, I should have some Ansible/Consul examples 
published soon! With Docker in the mix, we are calling the collection of 
microservices/clustering platform patterns using this trio "marina". 
"marina" is just the conceptual glue to make these things work together in 
a coherent way, you won't be able to clone "marina"! But I think the 
patterns contained in the playbooks may help answer coordination problems 
like this.

On Wednesday, July 23, 2014 4:16:50 PM UTC-7, David Reagan wrote:
>
> Specifically, an elasticsearch cluster, but doing this the right way would 
> also apply to other kinds of clusters, like RabbitMQ or Redis.
>
> I'm using https://github.com/LaneCommunityCollege/aspects_elasticsearch 
> to manage my elasticsearch cluster. Currently, if I modify the 
> configuration settings, Ansible would issue a restart to all the nodes in 
> the cluster. 
>
> The obvious answer is to only run the playbook on one node at a time. But 
> there are situations where that isn't convenient. If I run Ansible like a 
> puppet agent then setting the configuration setting in the group_vars file 
> will apply it on all the nodes, thus restarting them all. Or if I have a 
> large number of nodes. Or simply need to apply the change, but don't have 
> time to do it one or two nodes at a time.
>
> The other solutions I've thought of involve custom scripts that would not 
> be part of the Ansible playbook. And likely be application specific.
>
> Is there a way to tell ansible to set a service restart task sometime in 
> the future? Say, right now for node1, 5 minutes later for node2, 10 minutes 
> later for node3, and so on. 
>
> Maybe the at module? How would it know to set it 10 minutes in the future 
> instead of 5 for the third node? 
>
> So, yeah, is there a good Ansible specific method for this? Or do I need 
> to look outside of Ansible?
>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/358e5562-de41-45bd-bdbf-c190ca0c5347%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to