Funny/coincidence you mentioned that..

I'm looking at a production case right now, for a customer of hornetQ
where network is slow (some switch broken.. I don't know exactly..
that's part of the investigation)... they need to find when that is
happening and colocated with other logs.



On Thu, Aug 10, 2017 at 8:53 AM, Gary Tully <gary.tu...@gmail.com> wrote:
> nice, I think there is value in just logging this information and not
> halting of stopping.
> In this way the feature can be used to determine usage patterns and spikes
> etc and it would be possible to determine what the critical levels are.
> This would allow a separation between getting information and doing
> something about it.
>
> On Sun, 6 Aug 2017 at 05:58 Michael André Pearce <
> michael.andre.pea...@me.com> wrote:
>
>> Thanks Clebert have left my feedback directly on the PR.
>>
>> Cheers
>> Mike
>>
>> Sent from my iPhone
>>
>> > On 5 Aug 2017, at 06:03, Clebert Suconic <clebert.suco...@gmail.com>
>> wrote:
>> >
>> > PR Sent.. i would appreciate reviews.
>> >
>> > thanks
>> >
>> > On Fri, Aug 4, 2017 at 1:02 PM, Clebert Suconic
>> > <clebert.suco...@gmail.com> wrote:
>> >> I'm adding some logic to detect cases where the broker may become
>> irresponsive.
>> >>
>> >> I'm adding a component called CriticalAnalyzer, which will inspect
>> >> response times of certain operations and decide to take the broker
>> >> down when bad things are happening.
>> >>
>> >>
>> >> Along several critical operations on the broker, I'm adding this
>> pattern:
>> >>
>> >>
>> >> enterCritical(pathID);
>> >> try {
>> >>   synchronized (lock) {
>> >>   }
>> >> } finally {
>> >>   leaveCritical(pathID);
>> >> }
>> >>
>> >> The CriticalAnalyzer will look at the times between enter and leave,
>> >> and with a configured timeout, it will take the broker down.
>> >>
>> >>
>> >>
>> >> Now, when it's coming to the configuration, I'm not finding a good
>> >> nomenclature for this.. and I'm asking for help:
>> >>
>> >> So, far I came up with these names:
>> >>
>> >> - analyze-critical : default true
>> >>  is the critical analyzer on?
>> >>
>> >> - analyze-critical-timeout: default 120000 (milliseconds, 2 minutes)
>> >>  The timeout used to
>> >>
>> >> - analyze-critical-check-period default 1/2 of analyze-critical-timeout
>> >>
>> >> - analyze-critical-halt-on-failure: default false
>> >>  In case of an issue, the a Runtime.halt() would be issued if true,
>> >>  otherwise a shutdown.
>> >>
>> >> During deadlocks or IO issues, the most effective way would be
>> >> actually the halt. We could even change the start scripts to restart
>> >> the server in case of a returned value.
>> >>
>> >>
>> >>
>> >>
>> >> Any input?
>> >>
>> >>
>> >> I will send a Pull Request soon.
>> >>
>> >>
>> >> --
>> >> Clebert Suconic
>> >
>> >
>> >
>> > --
>> > Clebert Suconic
>>



-- 
Clebert Suconic

Reply via email to