Matthew,
My team runs at least 10 distinct IDM farms (realms), and our largest has 20 
idm replicas all over the world.   All are RHEL7 (and RHEL6 when it was a 
thing), half use AD trusts, and the others are self-contained.   After many 
years and far more replica re-inits than I care to remember, we have found the 
following points worth considering.

  *   WAN latency is your enemy!  Place your replicas in well-connected 
locations, and let your far-flung locations be clients.
  *   WAN resiliency is a must!  If you have frequent network isolations, or 
bouncy WAN links in certain locations, don’t put replicas there.  Again, point 
the clients in those sites to your core replica farm.
  *   Follow a 4-node “tightly coupled” model as best you can.   Put no less 
than 2, nor more than 4 replicas in one location
  *   Keep your per-node replication agreements to 4 or fewer, and across 
different links.
  *   Let SSSD caching help you on your truly remote clients.  Resist the 
temptation (or pressure) to put an IDM server in one remote location just to 
speed up individual login performance;  you’ll hate yourself months later when 
you have to spend a week re-init’ing every server  to get them back in sync.
  *   Think hard about how much your content changes, and where.   E.g:
     *   if you are registering new clients all the time in 60 different 
locations, replication storms, and contention, can become a thing.
     *   If your content is relatively static it’ll tolerate less-than-ideal 
connectivity better
     *   If you can focus the majority of your content changes in one location, 
with most other sites being “read-only”, you might tolerate WAN issues better.

Without knowing your network, it is hard to say, but let’s imagine you’re in 
AWS.   You might put 4-node replication clusters in US-East, US-West, a couple 
clusters in EU, a couple clusters in APAC, etc etc.   Set up your replication 
agreements such that APAC can get it from US and EU, US from APAC and EU, and 
EU from APAC and US.   Then point each client site to the nearest IDM replicas 
(hint: DNS SRV records).     See Figure 3.2 on the RH URL you shared.  
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/planning_identity_management/planning-the-replica-topology_planning-identity-management#planning-the-replica-topology-replica-topology-example-1-fin

What you don’t want is a user in one very remote location changing their 
password, and another system registering as a new client on the other side of 
the world, and the only way for every IDM server to replicate is to take 10 
slow hops to get to the other.    As a matter of perspective:  In our 20-node 
far-flung (less-than-ideal topology) IDM farm, if we mass delete host entries, 
we do 100 at a time and wait 5 minutes before doing another 100 to let the 
deletion wave traverse the entire farm.

Cautionary Tale for you and anybody else reading this:    This “happened to a 
friend” 😉 several years ago, and we learned many lessons from it…  Imagine you 
have an autoscaling group in AWS or whatever-cloud, and there’s an automated 
process to register the client to IDM.  Now, imagine there is something flawed 
in the ASG instances which cause them to terminate immediately and spin up a 
new instance.  Demand calls for 100 new instances, and left over the weekend 
the ASG spins up (and terminates) 40,000 instances across multiple regions and 
zones.  That means 40,000 host-add’s swarming your IDM farm.   It will 
collapse, impacting every existing clients’ ability to login or do anything – 
don’t be “that guy”.

Sorry for the long post (and I hope if formats OK), but I truly hope it helps 
you and others.  The 60-replica model is probably based in a lab environment 
with nearly 0ms latency between all nodes.   But again, my experience is RHEL7. 
 RHEL8 seems to have some extra capability, and upstream IPA even more, so YMMV.

Best Regards,
--
| Pat Larkin  [email protected]<mailto:[email protected]> | Texas 
USA   |
|  Manager | Linux Engineering |  http://go/linuximo |
|        +1.682.213.4281       |  http://go/LinuxOps |
-----------------------------------------------------

From: Matthew Davis via FreeIPA-users <[email protected]>
Sent: Thursday, September 15, 2022 4:53 PM
To: FreeIPA users list <[email protected]>
Cc: Matthew Davis <[email protected]>
Subject: [Freeipa-users] Replication topology size limitations?
…
I have over 60 geographical locations I was hoping to place a replica.  I will 
easily exceed the 60 replica limitation outlined in the documentation.  Can any 
elaborate on the 60 replica limitation?  Is this a hard limit?  What are the 
contributing factors for the existing limitation?

Each location will have far less than 2000 clients.  Are there any 
considerations that could accommodate a larger number of replica servers?

Thanks
--
________________________________
Matthew Davis

1 
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/planning_identity_management/planning-the-replica-topology_planning-identity-management#determining-the-appropriate-number-of-replicas_planning-the-replica-topology<https://urldefense.proofpoint.com/v2/url?u=https-3A__access.redhat.com_documentation_en-2Dus_red-5Fhat-5Fenterprise-5Flinux_9_html_planning-5Fidentity-5Fmanagement_planning-2Dthe-2Dreplica-2Dtopology-5Fplanning-2Didentity-2Dmanagement-23determining-2Dthe-2Dappropriate-2Dnumber-2Dof-2Dreplicas-5Fplanning-2Dthe-2Dreplica-2Dtopology&d=DwMFaQ&c=FXJfUb1oWgygD0uNz-ujnA&r=l37pfZV_u6pyYkKsobNh_uIIwFjWLrn1JtFRRdCrGKo&m=Cvev5JdWJ9yIGfp-tsmLEndX_ijuMX7kcp6YODUMjTh0jxJDyFB0BFzG9Gg4StNe&s=Zd5Tww0CcGLi8_MGrGXsExK1y0HkmzBZE6F3p0DhFXs&e=>
_______________________________________________
FreeIPA-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedorahosted.org/archives/list/[email protected]
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to