On Tue, Nov 22, 2011 at 04:44:50PM +0100, [email protected] wrote: > Hi again, > > that's strange because I did tests around this parameter > LRMD_MAX_CHILDREN, > with 24 Dummy resources, therefore resources which do quite nothing and so > Pacemaker > should start all resources at quite the same time one after the other. > Then monitor op > should also be quite at the same time one after the other. > First, I test with no LRMD_MAX_CHILDREN in /etc/sysconfig/pacemaker so > default value > which is probably 4 as you told me, then I set it to 2, restart Pacemaker > and did same test, > and finally set it to 24 (just for a school case) and did the same test . > And the result is the same for the three tests : > when all the 24 Dummy resources are started , as you can see below, > the op monitor seems to be gathered by 4, whatever is the > LRMD_MAX_CHILDREN value, > whereas my understanding was the monitor operations should have been > parallelized for > almost the 24 resources as the monitor takes a very short while to be > completed ...
> Where am I wrong ? It could be that the init script on your platform doesn't support this parameter. You should talk to your vendor. Thanks, Dejan > [root@cuzco4 tmp]# grep monitor /var/log/syslog | grep resname | grep ok > 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname1_monitor_20000 (call=236, rc=0, > cib-update=436, confirmed=false) ok > 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname3_monitor_20000 (call=237, rc=0, > cib-update=437, confirmed=false) ok > 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname5_monitor_20000 (call=238, rc=0, > cib-update=438, confirmed=false) ok > 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname7_monitor_20000 (call=239, rc=0, > cib-update=439, confirmed=false) ok > 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname15_monitor_20000 (call=240, rc=0, > cib-update=440, confirmed=false) ok > 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname9_monitor_20000 (call=241, rc=0, > cib-update=441, confirmed=false) ok > 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname11_monitor_20000 (call=242, rc=0, > cib-update=442, confirmed=false) ok > 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname13_monitor_20000 (call=243, rc=0, > cib-update=443, confirmed=false) ok > 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname17_monitor_20000 (call=244, rc=0, > cib-update=444, confirmed=false) ok > 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname23_monitor_20000 (call=245, rc=0, > cib-update=445, confirmed=false) ok > 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname21_monitor_20000 (call=246, rc=0, > cib-update=446, confirmed=false) ok > 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: > process_lrm_event: LRM operation resname19_monitor_20000 (call=247, rc=0, > cib-update=447, confirmed=false) ok > [root@cuzco6 tmp]# grep monitor /var/log/syslog | grep resname | grep ok > 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname2_monitor_20000 (call=236, rc=0, > cib-update=245, confirmed=false) ok > 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname4_monitor_20000 (call=237, rc=0, > cib-update=246, confirmed=false) ok > 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname8_monitor_20000 (call=238, rc=0, > cib-update=247, confirmed=false) ok > 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname6_monitor_20000 (call=239, rc=0, > cib-update=248, confirmed=false) ok > 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname10_monitor_20000 (call=240, rc=0, > cib-update=249, confirmed=false) ok > 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname16_monitor_20000 (call=241, rc=0, > cib-update=250, confirmed=false) ok > 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname14_monitor_20000 (call=242, rc=0, > cib-update=251, confirmed=false) ok > 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname12_monitor_20000 (call=243, rc=0, > cib-update=252, confirmed=false) ok > 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname24_monitor_20000 (call=244, rc=0, > cib-update=253, confirmed=false) ok > 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname22_monitor_20000 (call=245, rc=0, > cib-update=254, confirmed=false) ok > 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname20_monitor_20000 (call=246, rc=0, > cib-update=255, confirmed=false) ok > 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: > process_lrm_event: LRM operation resname18_monitor_20000 (call=247, rc=0, > cib-update=256, confirmed=false) ok > > Alain > > > > > De : Dejan Muhamedagic <[email protected]> > A : General Linux-HA mailing list <[email protected]> > Date : 22/11/2011 13:18 > Objet : Re: [Linux-HA] Antw: What about "start-delay" attribute status ? > Envoyé par : [email protected] > > > > Hi, > > On Tue, Nov 22, 2011 at 08:17:28AM +0100, [email protected] wrote: > > Hi > > > > By the way, is there a description somewhere of parameters from > > /etc/sysconfig/pacemaker ? > > To the best of my knowledge, there is only LRMD_MAX_CHILDREN. > > Thanks, > > Dejan > > > Thanks > > Alain > > > > > > > > De : Dejan Muhamedagic <[email protected]> > > A : General Linux-HA mailing list <[email protected]> > > Date : 21/11/2011 15:48 > > Objet : Re: [Linux-HA] Antw: What about "start-delay" attribute status > ? > > Envoyé par : [email protected] > > > > > > > > On Mon, Nov 21, 2011 at 03:07:43PM +0100, [email protected] wrote: > > > Thanks Dejan, > > > ok I understand, so we have to choose between a small value of > > > LRMD_MAX_CHILDREN > > > and on start, stop, or status of 64 resources it will take a while ... > > > > and a big value of LRMD_MAX_CHILDREN and then either the start, stop > and > > > at best, status will be achieved very quickly as they are parallelized > > > or > > > at > > > worst the system will be "on knees" ... > > > We'll give it a try ... as I have big computers ;-) > > > > Just note that you should try to think of every possible > > combination of resource operations. For instance, imagine 64 Xen > > VMs trying to start in parallel. Better be conservative than > > to push your nodes to their limit. > > > > > But my question is now : when you write : > > > "Let me just add that operations which were supposed to > > > start at the same time get spaced out." > > > So if LRMD_MAX_CHILDREN=4, that means that if ask for start on 32 > > > resources at the > > > same time, Pacemaker will mange 4, delay the remaing 28, manage 4 > again, > > > > > etc. so > > > it will be completed in 8 shots, right ? > > > > No. > > > > > But what is the delay value between each shot ? > > > > There is none. As soon as one operation finishes, another one > > gets started. Now, if you have say four big RDBMS instances > > starting and each of them takes five minutes or so, the other > > resources will obviously stay in the queue for five minutes. > > > > Anyway, you can see for yourself on cluster start, just grep > > your logs for lrmd:.*rsc:, it should show you all timestamps > > when certain operation was started (apart from recurring > > monitors). > > > > Thanks, > > > > Dejan > > > > > Thanks > > > Alain > > > > > > > > > > > > > > > De : Dejan Muhamedagic <[email protected]> > > > A : General Linux-HA mailing list <[email protected]> > > > Date : 21/11/2011 13:45 > > > Objet : Re: [Linux-HA] Antw: What about "start-delay" attribute > status > > ? > > > Envoyé par : [email protected] > > > > > > > > > > > > Hi, > > > > > > On Mon, Nov 21, 2011 at 01:42:15PM +0100, [email protected] wrote: > > > > Hi Florian, > > > > ok I've checked the thread, so that means that on RHEL6 , if I have > > > let's > > > > say 32 resources groups of 2 primitives on > > > > each node, I can set the LRMD_MAX_CHILDREN environment variable in > > > > /etc/sysconfig/pacemaker to 64 ? > > > > > > The number of resources shouldn't be the main criteria for > > > setting this parameter, but what can your nodes handle without > > > being overloaded. So, 64 sounds sounds like you have some really > > > big computers :) It also depends on the nature of the cluster > > > resources. The default of 4 is rather conservative, perhaps > > > nowadays 8 would be better. > > > > > > > Is it acceptable for lrmd and Pacemaker ? Or will we face any > > > side-effect > > > > ? > > > > > > LRMD_MAX_CHILDREN is the maximum number of resource operations > > > allowed to run in parallel. Hope that that answers your question. > > > > > > Thanks, > > > > > > Dejan > > > > > > > Thanks > > > > Alain > > > > > > > > > > > > > > > > De : Florian Haas <[email protected]> > > > > A : General Linux-HA mailing list <[email protected]> > > > > Date : 21/11/2011 12:58 > > > > Objet : Re: [Linux-HA] Antw: What about "start-delay" attribute > > status > > > ? > > > > Envoyé par : [email protected] > > > > > > > > > > > > > > > > On 11/21/11 13:03, [email protected] wrote: > > > > > Hi, > > > > > yes that's exactly the purpose of my question (and exactly the > same > > > > > problem of "big-monitoring-trains") : > > > > > if we can always use start-delay to ramdomize the first monitor > > > > operation > > > > > time on all the resources on a server, > > > > > but if it is really deprecated, that means that in the future this > > > > > option > > > > > will no more > > > > > be managed by Pacemaker (perhaps it already is the case ... ?) , > so > > in > > > > > > > > this case > > > > > we must not use this option. > > > > > > > > > > Could someone give us a clear status on this option "start-delay" > ? > > > > > > > > If your RA needs it, then the RA is most likely broken. :) > > > > > > > > For monitor operations allegedly piling up, please consider this: > > > > http://www.gossamer-threads.com/lists/linuxha/pacemaker/76152#76152 > > > > > > > > Hope this helps. > > > > Cheers, > > > > Florian > > > > > > > > -- > > > > Need help with High Availability? > > > > http://www.hastexo.com/now > > > > _______________________________________________ > > > > Linux-HA mailing list > > > > [email protected] > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > > _______________________________________________ > > > > Linux-HA mailing list > > > > [email protected] > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > > > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
