Matthew,
My team runs at least 10 distinct IDM farms (realms), and our largest has 20
idm replicas all over the world. All are RHEL7 (and RHEL6 when it was a
thing), half use AD trusts, and the others are self-contained. After many
years and far more replica re-inits than I care to remember, we have found the
following points worth considering.
* WAN latency is your enemy! Place your replicas in well-connected
locations, and let your far-flung locations be clients.
* WAN resiliency is a must! If you have frequent network isolations, or
bouncy WAN links in certain locations, don’t put replicas there. Again, point
the clients in those sites to your core replica farm.
* Follow a 4-node “tightly coupled” model as best you can. Put no less
than 2, nor more than 4 replicas in one location
* Keep your per-node replication agreements to 4 or fewer, and across
different links.
* Let SSSD caching help you on your truly remote clients. Resist the
temptation (or pressure) to put an IDM server in one remote location just to
speed up individual login performance; you’ll hate yourself months later when
you have to spend a week re-init’ing every server to get them back in sync.
* Think hard about how much your content changes, and where. E.g:
* if you are registering new clients all the time in 60 different
locations, replication storms, and contention, can become a thing.
* If your content is relatively static it’ll tolerate less-than-ideal
connectivity better
* If you can focus the majority of your content changes in one location,
with most other sites being “read-only”, you might tolerate WAN issues better.
Without knowing your network, it is hard to say, but let’s imagine you’re in
AWS. You might put 4-node replication clusters in US-East, US-West, a couple
clusters in EU, a couple clusters in APAC, etc etc. Set up your replication
agreements such that APAC can get it from US and EU, US from APAC and EU, and
EU from APAC and US. Then point each client site to the nearest IDM replicas
(hint: DNS SRV records). See Figure 3.2 on the RH URL you shared.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/planning_identity_management/planning-the-replica-topology_planning-identity-management#planning-the-replica-topology-replica-topology-example-1-fin
What you don’t want is a user in one very remote location changing their
password, and another system registering as a new client on the other side of
the world, and the only way for every IDM server to replicate is to take 10
slow hops to get to the other. As a matter of perspective: In our 20-node
far-flung (less-than-ideal topology) IDM farm, if we mass delete host entries,
we do 100 at a time and wait 5 minutes before doing another 100 to let the
deletion wave traverse the entire farm.
Cautionary Tale for you and anybody else reading this: This “happened to a
friend” 😉 several years ago, and we learned many lessons from it… Imagine you
have an autoscaling group in AWS or whatever-cloud, and there’s an automated
process to register the client to IDM. Now, imagine there is something flawed
in the ASG instances which cause them to terminate immediately and spin up a
new instance. Demand calls for 100 new instances, and left over the weekend
the ASG spins up (and terminates) 40,000 instances across multiple regions and
zones. That means 40,000 host-add’s swarming your IDM farm. It will
collapse, impacting every existing clients’ ability to login or do anything –
don’t be “that guy”.
Sorry for the long post (and I hope if formats OK), but I truly hope it helps
you and others. The 60-replica model is probably based in a lab environment
with nearly 0ms latency between all nodes. But again, my experience is RHEL7.
RHEL8 seems to have some extra capability, and upstream IPA even more, so YMMV.
Best Regards,
--
| Pat Larkin [email protected]<mailto:[email protected]> | Texas
USA |
| Manager | Linux Engineering | http://go/linuximo |
| +1.682.213.4281 | http://go/LinuxOps |
-----------------------------------------------------
From: Matthew Davis via FreeIPA-users <[email protected]>
Sent: Thursday, September 15, 2022 4:53 PM
To: FreeIPA users list <[email protected]>
Cc: Matthew Davis <[email protected]>
Subject: [Freeipa-users] Replication topology size limitations?
…
I have over 60 geographical locations I was hoping to place a replica. I will
easily exceed the 60 replica limitation outlined in the documentation. Can any
elaborate on the 60 replica limitation? Is this a hard limit? What are the
contributing factors for the existing limitation?
Each location will have far less than 2000 clients. Are there any
considerations that could accommodate a larger number of replica servers?
Thanks
--
________________________________
Matthew Davis
1
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/planning_identity_management/planning-the-replica-topology_planning-identity-management#determining-the-appropriate-number-of-replicas_planning-the-replica-topology<https://urldefense.proofpoint.com/v2/url?u=https-3A__access.redhat.com_documentation_en-2Dus_red-5Fhat-5Fenterprise-5Flinux_9_html_planning-5Fidentity-5Fmanagement_planning-2Dthe-2Dreplica-2Dtopology-5Fplanning-2Didentity-2Dmanagement-23determining-2Dthe-2Dappropriate-2Dnumber-2Dof-2Dreplicas-5Fplanning-2Dthe-2Dreplica-2Dtopology&d=DwMFaQ&c=FXJfUb1oWgygD0uNz-ujnA&r=l37pfZV_u6pyYkKsobNh_uIIwFjWLrn1JtFRRdCrGKo&m=Cvev5JdWJ9yIGfp-tsmLEndX_ijuMX7kcp6YODUMjTh0jxJDyFB0BFzG9Gg4StNe&s=Zd5Tww0CcGLi8_MGrGXsExK1y0HkmzBZE6F3p0DhFXs&e=>
_______________________________________________
FreeIPA-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct:
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives:
https://lists.fedorahosted.org/archives/list/[email protected]
Do not reply to spam, report it:
https://pagure.io/fedora-infrastructure/new_issue