We are going similar things on not too different a scale (~2000 servers.) 
We abuse groups heavily. I have the mantra that a fact should only be 
defined once. So if I have a link between two machines, I don't want the 
same link definition in both. This means I use two different things 
heavily. One is groups and the other is include_vars. Also everything we do 
is done in roles.
I should preface this with the fact that I spent quite a while figuring out 
what kind of data model I wanted for defining all the pieces of the data 
center. Don't try this approach unless you like staring for long periods of 
time at whiteboards covered with nested python data structures.

One of the harder things we do is define haproxy configs for all our load 
balancers (somewhere between 50 and 100 separate services.) This ended up 
just too complex for a j2 template so I wrote a python action plugin. I 
have a group called haproxies with a subgroup for each service and sub-sub 
groups for the load balancers in a given data center. All the defaults are 
defined in haproxies, then the information needed for the given service's 
config are in the haproxies_svx_xxx group_vars. Then there are the 
haproxies_svc_xxx_yyy where yyy is the data center. These just contain the 
pointers to the groups that form the production servers that should be 
pointed to in this data center's service haproxy config. Finally I have 
groups for the production servers for a service and subgroups for each set 
that will be separately referenced in the haproxy config.

I have the ansible.cfg set to merge dictionaries and then have a big 
haproxy dictionary. Each variable that is defined has a default key and a 
possible key for each service. The template code looks to see if there is a 
service specific key and if not then uses the default key. This allows a 
default plus override model that maintains the single source of truth model.

We also use centrally managed vars files for things that don't fit the 
hierarchy. All our BGP definitions (we use it in a number of ways) span 
multiple groups, so we have a bgp_vars centrally. This one is hand edited 
and has information that all the other groups that need it pull from. We 
also go the other way. We take all the information we learned from all the 
haproxy groups and generate a file with that information organized for 
other roles to use easily in templates. This is not a source of truth, just 
a convenient translation of other data.


A few tips on scaling groups. Be religious in prefixing group and role 
information with the related entity. Never have loghost, always have 
group_loghost or role_loghost. that way it is still easy to follow but 
never clashes. This means that roles and groups need to be different. I do 
this by having roles be singular and groups be plural (haproxy is my role 
and haproxies is my parent group).

My experience is that the issue of scaling is one of how many systems x how 
many operations you do. Doing lots of operations on a small set of machines 
works. Doing a small set of operations on a large set of machines works. 
trying to configure an entire data center from bare install in a single 
playbook is probably a bad idea. Other than things like rolling passwords 
and keys, we have gone to the configure the instances of a type of machine 
model. Since everything is in roles and inventory definitions, it's just an 
issue of listing all the roles that are needed to set up the service.

This is just one way to attack this problem. Hope that helps.

jerry

On Wednesday, June 21, 2017 at 2:30:30 PM UTC-7, William Saxton wrote:
>
> New user here trying to figure out the best way to convert our current 
> server provisioning system to Ansible.  Our system uses approx. 5 different 
> attributes to provision each server and we have about 1,000 servers.  I'm 
> wondering whether we could get by by using Ansible's built-in mechanism for 
> support "groups" and variables in "group_vars".  That would certainly be 
> the easiest way...just not sure it would scale well at all.
>
> I'm estimating about 100 different "groups" based on all combinations of 
> these attributes.  For example, assuming we have about 40 different groups 
> corresponding to playbooks (webserver, dbserver, appclient), 40 different 
> "projects" (managing root passwords and access), 8 different "locations" 
> (managing things like ntp server settings).
>
> Is anyone out there doing something like this?  My worries are:
>
> - Scalability.  Can Ansible handle this?  What about 10k servers?  The 
> inventory script will contain roughly 100 different groups, totaling about 
> 5,000 server entries (1k servers * 5 groups)
> - Maintainability.  The group_vars directory will probably contain 100+ 
> files.  The all.yaml file itself will probably be hundreds of lines long.  
> - Managing group conflict.  What happens when someone puts the 
> "ntp_server" setting, which is supposed to be in a site-specific yaml file, 
> is put inside one of the project-specific yaml files?   According to the 
> documentation, the last file alphabetically gets precedence.  That's really 
> not acceptable, but I don't know another way to do it.
>
> Summary: looking for people with real-world Ansible experience who may be 
> dealing with a similar setup.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ansible-project+unsubscr...@googlegroups.com.
To post to this group, send email to ansible-project@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/9c1de698-77af-43c1-bca9-96f7ee3552e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to