In some cases this is useful, in some case no.  There is a bit of a chicken
and an egg problem with my situation here.  My systems are all completely
stateless so at boot-time we determine what applications are to be loaded
on the host and are all pulled down and started up.  Once they are under
monit control we will either

1) start stop apps on an individual basis as needed (or restart when they
crash)

2) stop all apps in one shot, or start all apps in one shot (monit restart
all to the rescue)

3) stop "groupings" of apps in one shot, or start "groupings" of apps in
one shot.  The problem with grouping them is that since we are a dynamic
environment (apps get stopped, started, uninstalled, installed) there is no
way for us to group them together in our configs as its usually done in an
adhoc manner.  For eg, stop all apps that start with the word ABC and end
in 50-100.

4) Lastly, our apps have a built-in "restart yourself" function that can be
triggered by our end-users through a control port and that is one way they
will do it.  The apps will be sent a trigger to the control port and they
will exit and monit attempts to restart them.  I am also dealing with
another monit issue there but I am hoping the latest version rectifies that.

So far with 'stop all' and 'start all' I have no issues with that for
maintenance windows but my users tend to lean on using 4) since they have a
ton of maintenance scripts written around this technology.

-Chris

On Fri, Feb 3, 2012 at 4:15 PM, Wayne Lawrence
<[email protected]>wrote:

> Have you thought about putting the program's in groups I use this method
> to stop and start groups of apps with monit without any issues and I am
> starting between 10 and 16 processes.
>
> Sent from my iPhone
>
> On 3 Feb 2012, at 20:45, Christopher Johnston <[email protected]> wrote:
>
> Okie, we switched our central 'launch' script which essentially takes the
> list of apps from 'monit summary' stops them (some of them), then does a
> start on the list that matches the regex.  If 10 sequential commands get
> sent to monit it will fail to start 1 or 2 of them and I see this error in
> my logs.  Does monit have issues receiving multiple commands all at once?
>  Seems like an issue to me that monit can't scale to handle requests like
> this.  This is a multi-user environment where app owners stop and start
> their apps at their leisure.
>
> <27> Feb  3 12:41:00.441595 -08:00 dev001 monit[25592]: monit: action
> failed -- Other action already in progress -- please try again later
>
>
> On Thu, Feb 2, 2012 at 12:53 PM, Christopher Johnston 
> <[email protected]>wrote:
>
>> Ok - I grokked the script that handles the restar.  I think this could be
>> the cause, it is essentially doing a 'stop && start' so the initiating
>> start is producing that message since there is already another action going
>> (to stop the app).  We will modify this to use 'restart' instead.
>>
>>
>> On Thu, Feb 2, 2012 at 10:55 AM, Christopher Johnston <[email protected]
>> > wrote:
>>
>>> I am a little confused on why I am seeing this.  I have 4 applications
>>> on my host (in some cases up to 10) where we need to do a dailly/weekly
>>> rolling restart of all the apps on the host.   If I signal  4 monit restart
>>> commands to the apps in sequence I will end up in a situation where only 2
>>> or 3 out of the apps come up and monit complains that an action is already
>>> in progress (assuming its from the other commands).  Monit can't handle
>>> getting signaled 4x to take down apps and restart them?  This creates some
>>> issues for us when we are doing a mass code roll out to 100s of
>>> applications.  We end up having to go and clean up things manually and the
>>> driver behind using monit is to provide an automated framework for managing
>>> apps and guaranteeing uptime.
>>>
>>> Is there any way to remedy this?  We are using a very low timeout in
>>> monit since we can't risk having apps down for long periods could this have
>>> something to do with it?
>>>
>>> <27> Feb  2 07:48:43.202228 -08:00 dev001 monit[3263]: monit: action
>>> failed -- Other action already in progress -- please try again later
>>>
>>>
>>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>
>
> --
> To unsubscribe:
> https://lists.nongnu.org/mailman/listinfo/monit-general
>
--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general

Reply via email to