Hi,
sorry for the delay on this thread, I was unavailable a few weeks, but
just FYI, I wanted to share some results I got a few weeks ago:
I've tried some tests on a configuration and start/stop of 500 Dummy
resources, and I got these time values :
1/ configuration with successive crm commands "crm configure primitive
..." :**it takes about 1H so it is not usable
2/ with a unique crm command "crm configure < File " with all dummy
primitives in File : it takes 7s / that's OK
3/ add just one location constraint for each dummy primitive with "crm
configure < File" with all constraints in File : it takes 27s / strange
but acceptable
4/ Start of the 500 primitives with successive crm commands "crm
resource start ..." :it takes 7mn28 / seems not acceptable moreover
for dummy resources ...
5/ Start of the 500 primitives with parallel (background) crm commands
"crm resource start ... &" : not possible, lots of commands exit in
errors and anyway it takes also long time
6/ Start of the 500 primitives in parallel by setting all target-roles
to "Started" in Pacemaker:
=> with crm configure edit : s/Stopped/Started on the 500 primitives
Result : around6 mn for all primitives to be started. Seems not
acceptable moreover for dummy resources , and it will take let's say
about 3 mn for a failover if primitives are
well located half on one node and half on the other.
These results are with dummy resources, and we can imagine that with
real resources it will take much longer, not speaking about the periodic
monitoring of 500 primitives ...
So, based on these results, I think that the limit in number of
resources is far below 500 resources ...
But I wanted to give these results just to keep going on this subject
and perhaps get some ideas ...
Thanks
Alain
Le 05/09/2013 10:58, Lars Marowsky-Bree a écrit :
On 2013-09-04T08:26:14, Ulrich Windl <[email protected]> wrote:
In my experience network traffic grows somewhat linear with the size
of the CIB. At some point you probably have to change communication
parameters to keep the cluster in a happy comminication state.
Yes, I wish corosync would "auto-tune" to a higher degree. Apparently
though, that's a slightly harder problem.
We welcome any feedback on required tunables. Those that we ship on SLE
HA worked for us (and even for rather largeish configurations), but they
may not be appropriate everywhere.
Despite of the cluster internals, there may be problems if a node goes
online and hundreds of resources are started in parallel, specifically
if those resources weren't designed for it. I suspect IP addresses,
MD-RAIDs, LVM stuff, drbd, filesystems, exportfs, etc.
No, most of these resource scripts *are* supposed to be
concurrency-safe. If you find something that breaks, please share the
feedback.
It's true that the way how concurrent load limitation is implemented in
Pacemaker/LRM isn't perfect yet. batch-limit is rather coarse. The
per-node LRM child limit is probably the best bet right now. But it
doesn't differentiate between starting many light-weight resources in
parallel (such as IPaddr) versus heavy-weights (VMs with Oracle
databases).
(migration-threshold goes in the same direction.)
Historical context matters. Pacemaker comes from the HA world; we still
believe 3-7 node clusters are the largest anyone ought to reasonably
build, considering the failure/admin/security domain issues with single
point of failures and the increasing likelihood of double failures etc.
But there's several trends -
Even those 3-7 nodes become increasingly powerful multi-core kick-ass
boxes. 7 nodes might well host hundreds of resources nowadays (say,
above 70 VMs with all their supporting resources).
People build much larger clusters because there's no good way to "divide
and conquer" yet - e.g., if you build several 3 or 5 node clusters,
there's no support for managing those clusters-of-clusters.
And people use Pacemaker for HPC style deployments (e.g., private
clouds with tons of VMs) - because while our HPC support is suboptimal,
it is better than the HA support in most of the Cloud offerings.
As a note: Just recently we had a failure in MD-RAID activation with no real
reason to be found in syslog, and the cluster got quite confused.
(I had reported this to my favourite supporter (SR 10851868591), but haven't
heard anything since then...)
I'll try to dig that out of the support system and give it a look.
Regards,
Lars
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems