Re: [Linux-HA] Antw: What about "start-delay" attribute status ?

Dejan Muhamedagic Tue, 22 Nov 2011 09:05:55 -0800

On Tue, Nov 22, 2011 at 04:44:50PM +0100, [email protected] wrote:
> Hi again,
> 
> that's strange because I did tests around this parameter 
> LRMD_MAX_CHILDREN,
> with 24 Dummy resources, therefore resources which do quite nothing and so 
> Pacemaker
> should start all resources at quite the same time one after the other. 
> Then monitor op
> should also be quite at the same time one after the other.
> First, I test with no  LRMD_MAX_CHILDREN in /etc/sysconfig/pacemaker so 
> default value 
> which is probably 4 as you told me, then  I set it to 2, restart Pacemaker 
> and did same test,
> and finally set it to 24 (just for a school case) and did the same test .
> And the result is the same for the three tests :
> when all the 24 Dummy resources are started , as you can see below,
> the op monitor seems to be gathered by 4, whatever is the 
> LRMD_MAX_CHILDREN value,
> whereas my understanding was the monitor operations should have been 
> parallelized for 
> almost the 24 resources as the monitor takes a very short while to be 
> completed ...


> Where am I wrong ?

It could be that the init script on your platform doesn't
support this parameter. You should talk to your vendor.

Thanks,

Dejan

> [root@cuzco4 tmp]# grep monitor /var/log/syslog | grep resname | grep ok
> 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname1_monitor_20000 (call=236, rc=0, 
> cib-update=436, confirmed=false) ok
> 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname3_monitor_20000 (call=237, rc=0, 
> cib-update=437, confirmed=false) ok
> 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname5_monitor_20000 (call=238, rc=0, 
> cib-update=438, confirmed=false) ok
> 1321975309 2011 Nov 22 16:21:49 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname7_monitor_20000 (call=239, rc=0, 
> cib-update=439, confirmed=false) ok
> 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname15_monitor_20000 (call=240, rc=0, 
> cib-update=440, confirmed=false) ok
> 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname9_monitor_20000 (call=241, rc=0, 
> cib-update=441, confirmed=false) ok
> 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname11_monitor_20000 (call=242, rc=0, 
> cib-update=442, confirmed=false) ok
> 1321975310 2011 Nov 22 16:21:50 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname13_monitor_20000 (call=243, rc=0, 
> cib-update=443, confirmed=false) ok
> 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname17_monitor_20000 (call=244, rc=0, 
> cib-update=444, confirmed=false) ok
> 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname23_monitor_20000 (call=245, rc=0, 
> cib-update=445, confirmed=false) ok
> 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname21_monitor_20000 (call=246, rc=0, 
> cib-update=446, confirmed=false) ok
> 1321975311 2011 Nov 22 16:21:51 cuzco4 daemon info crmd [24774]: info: 
> process_lrm_event: LRM operation resname19_monitor_20000 (call=247, rc=0, 
> cib-update=447, confirmed=false) ok
> [root@cuzco6 tmp]# grep monitor /var/log/syslog | grep resname | grep ok
> 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname2_monitor_20000 (call=236, rc=0, 
> cib-update=245, confirmed=false) ok
> 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname4_monitor_20000 (call=237, rc=0, 
> cib-update=246, confirmed=false) ok
> 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname8_monitor_20000 (call=238, rc=0, 
> cib-update=247, confirmed=false) ok
> 1321975347 2011 Nov 22 16:22:27 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname6_monitor_20000 (call=239, rc=0, 
> cib-update=248, confirmed=false) ok
> 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname10_monitor_20000 (call=240, rc=0, 
> cib-update=249, confirmed=false) ok
> 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname16_monitor_20000 (call=241, rc=0, 
> cib-update=250, confirmed=false) ok
> 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname14_monitor_20000 (call=242, rc=0, 
> cib-update=251, confirmed=false) ok
> 1321975348 2011 Nov 22 16:22:28 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname12_monitor_20000 (call=243, rc=0, 
> cib-update=252, confirmed=false) ok
> 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname24_monitor_20000 (call=244, rc=0, 
> cib-update=253, confirmed=false) ok
> 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname22_monitor_20000 (call=245, rc=0, 
> cib-update=254, confirmed=false) ok
> 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname20_monitor_20000 (call=246, rc=0, 
> cib-update=255, confirmed=false) ok
> 1321975349 2011 Nov 22 16:22:29 cuzco6 daemon info crmd [17240]: info: 
> process_lrm_event: LRM operation resname18_monitor_20000 (call=247, rc=0, 
> cib-update=256, confirmed=false) ok
> 
> Alain
> 
> 
> 
> 
> De :    Dejan Muhamedagic <[email protected]>
> A :     General Linux-HA mailing list <[email protected]>
> Date :  22/11/2011 13:18
> Objet : Re: [Linux-HA] Antw:  What about "start-delay" attribute status ?
> Envoyé par :    [email protected]
> 
> 
> 
> Hi,
> 
> On Tue, Nov 22, 2011 at 08:17:28AM +0100, [email protected] wrote:
> > Hi
> > 
> > By the way, is there a description somewhere of parameters from 
> > /etc/sysconfig/pacemaker ?
> 
> To the best of my knowledge, there is only LRMD_MAX_CHILDREN.
> 
> Thanks,
> 
> Dejan
> 
> > Thanks
> > Alain
> > 
> > 
> > 
> > De :    Dejan Muhamedagic <[email protected]>
> > A :     General Linux-HA mailing list <[email protected]>
> > Date :  21/11/2011 15:48
> > Objet : Re: [Linux-HA] Antw:  What about "start-delay" attribute status 
> ?
> > Envoyé par :    [email protected]
> > 
> > 
> > 
> > On Mon, Nov 21, 2011 at 03:07:43PM +0100, [email protected] wrote:
> > > Thanks Dejan,
> > > ok I understand, so we have to choose between a small value of 
> > > LRMD_MAX_CHILDREN
> > > and on start, stop, or status of 64 resources it will take a while ... 
> 
> > > and a big value of LRMD_MAX_CHILDREN and then either the start, stop 
> and
> > > at best, status will be achieved very quickly as they are parallelized 
> 
> > or 
> > > at
> > > worst the system will be "on knees" ... 
> > > We'll give it a try ... as I have big computers ;-)
> > 
> > Just note that you should try to think of every possible
> > combination of resource operations. For instance, imagine 64 Xen
> > VMs trying to start in parallel. Better be conservative than
> > to push your nodes to their limit.
> > 
> > > But my question is now : when you write :
> > > "Let me just add that operations which were supposed to
> > > start at the same time get spaced out."
> > > So if LRMD_MAX_CHILDREN=4, that means that if ask for start on 32 
> > > resources at the
> > > same time, Pacemaker will mange 4, delay the remaing 28, manage 4 
> again, 
> > 
> > > etc. so
> > > it will be completed in 8 shots, right ?
> > 
> > No.
> > 
> > > But what is the delay value between each shot ?
> > 
> > There is none. As soon as one operation finishes, another one
> > gets started. Now, if you have say four big RDBMS instances
> > starting and each of them takes five minutes or so, the other
> > resources will obviously stay in the queue for five minutes.
> > 
> > Anyway, you can see for yourself on cluster start, just grep
> > your logs for lrmd:.*rsc:, it should show you all timestamps
> > when certain operation was started (apart from recurring
> > monitors).
> > 
> > Thanks,
> > 
> > Dejan
> > 
> > > Thanks
> > > Alain
> > > 
> > > 
> > > 
> > > 
> > > De :    Dejan Muhamedagic <[email protected]>
> > > A :     General Linux-HA mailing list <[email protected]>
> > > Date :  21/11/2011 13:45
> > > Objet : Re: [Linux-HA] Antw:  What about "start-delay" attribute 
> status 
> > ?
> > > Envoyé par :    [email protected]
> > > 
> > > 
> > > 
> > > Hi,
> > > 
> > > On Mon, Nov 21, 2011 at 01:42:15PM +0100, [email protected] wrote:
> > > > Hi Florian,
> > > > ok I've checked the thread, so that means that on RHEL6 , if I have 
> > > let's 
> > > > say 32 resources groups of 2 primitives on
> > > > each node, I can set the LRMD_MAX_CHILDREN environment variable in 
> > > > /etc/sysconfig/pacemaker to 64 ? 
> > > 
> > > The number of resources shouldn't be the main criteria for
> > > setting this parameter, but what can your nodes handle without
> > > being overloaded. So, 64 sounds sounds like you have some really
> > > big computers :) It also depends on the nature of the cluster
> > > resources. The default of 4 is rather conservative, perhaps
> > > nowadays 8 would be better.
> > > 
> > > > Is it acceptable for lrmd and Pacemaker ? Or will we face any 
> > > side-effect 
> > > > ?
> > > 
> > > LRMD_MAX_CHILDREN is the maximum number of resource operations
> > > allowed to run in parallel. Hope that that answers your question.
> > > 
> > > Thanks,
> > > 
> > > Dejan
> > > 
> > > > Thanks
> > > > Alain
> > > > 
> > > > 
> > > > 
> > > > De :    Florian Haas <[email protected]>
> > > > A :     General Linux-HA mailing list <[email protected]>
> > > > Date :  21/11/2011 12:58
> > > > Objet : Re: [Linux-HA] Antw:  What about "start-delay" attribute 
> > status 
> > > ?
> > > > Envoyé par :    [email protected]
> > > > 
> > > > 
> > > > 
> > > > On 11/21/11 13:03, [email protected] wrote:
> > > > > Hi,
> > > > > yes that's exactly the purpose of my question (and exactly the 
> same 
> > > > > problem of "big-monitoring-trains")  : 
> > > > > if we can always use start-delay to ramdomize the first monitor 
> > > > operation 
> > > > > time on all the resources on a server,
> > > > > but if it is really deprecated, that means that in the future this 
> 
> > > > option 
> > > > > will no more
> > > > > be managed by Pacemaker (perhaps it already is the case ... ?) , 
> so 
> > in 
> > > 
> > > > > this case
> > > > > we must not use this option.
> > > > > 
> > > > > Could someone give us a clear status on this option "start-delay" 
> ?
> > > > 
> > > > If your RA needs it, then the RA is most likely broken. :)
> > > > 
> > > > For monitor operations allegedly piling up, please consider this:
> > > > http://www.gossamer-threads.com/lists/linuxha/pacemaker/76152#76152
> > > > 
> > > > Hope this helps.
> > > > Cheers,
> > > > Florian
> > > > 
> > > > -- 
> > > > Need help with High Availability?
> > > > http://www.hastexo.com/now
> > > > _______________________________________________
> > > > Linux-HA mailing list
> > > > [email protected]
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > > > 
> > > > _______________________________________________
> > > > Linux-HA mailing list
> > > > [email protected]
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > > 
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > 
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: What about "start-delay" attribute status ?

Reply via email to