Re: [IEP-35] GridJobProcessorMetrics migration

Alex Plehanov Tue, 25 Jun 2019 09:39:08 -0700

Hi, Nickolay

Yes, sure. I've left some comments on GitHub.


пн, 24 июн. 2019 г. в 19:15, Nikolay Izhikov <[email protected]>:

> Hello, Alex.
>
> Based on our private discussion I've additionally migrated
> `totalExecutionTime` and `totalWaitingTime` counters.
> Can you review the PR [1]?
>
> [1] https://github.com/apache/ignite/pull/6622
>
> В Пн, 24/06/2019 в 15:14 +0300, Nikolay Izhikov пишет:
> > Hello, Alex.
> >
> > Thanks for the answer.
> >
> > 1. I, actually, don't understand your proposal :)
> > Can you write it down?
> > What numbers should be additionally migrated in this PR?
> > Or it's OK for now?
> >
> > > I think "idle time" is a useful metric
> >
> > I think "usefulness" or "uselessness" of specific metrics depends on the
> questions we can answer with it.
> > What questions we can ask about Ignite instance and answer with "idle
> time" metric?
> >
> > > About execution and waiting time , it's not the right way to calculate
> it
> > > using a jobs list.
> >
> > Same question here.
> >
> > What questions we can answer with current numbers?
> >
> > > Will jobs list contain only active jobs?
> >
> > All jobs that are scheduled for execution on the node(active + waiting)
> should be in the list.
> > I try to put more details here, to expose my way of thinking about
> metrics and lists:
> >
> > If you have some issues with the jobs on the node it can be 2 kinds of
> issues:
> >       1. You are waiting for the results of some job and want to know
> why it doesn't execute.
> >
> >               In this case, you should query "jobs list" from Ignite.
> >               You can get an answer on:
> >                       * What jobs currently executes?
> >                       * How many time your job waiting to be executed?
> >
> >               You can also check "activeJobs", "waitingJobs" metrics
> graphics to know changes in the jobs queue during the time.
> >               Seems, you can predict the start of your job from these
> numbers.
> >
> >       2. You want to understand the lifecycle of some finished(failed
> job).
> >
> >               In this case, you should analyze the log of the node.
> >               It should contain information about time:
> >                       * node recieve job information
> >                       * job added to the queue
> >                       * job started execution
> >                       * job finished(failed) execution.
> >
> > I don't see questions we can't ask from these sources.
> > Do we have such?
> > How numbers from current GridJobMetrics can help with these questions?
> >
> >
> > > But, what if a user doesn't use any
> > > external monitoring system and wants to know the health of Ignite
> instance?
> >
> > It depends on how we define "health".
> > And it's not trivial question :)
> >
> > > Do we have any plans to implement some simple aggregator and ship it
> with Ignite?
> >
> > I think NO.
> > We shouldn't do it.
> >
> > > Do we have plans to provide some presets for Ignite monitoring for
> > > popular monitoring systems?
> >
> > I think we shouldn't do it.
> > Because monitoring presets heavily depends on the usage scenario.
> > And it can heavily vary for the Ignite.
> >
> >
> > В Пн, 24/06/2019 в 12:46 +0300, Alex Plehanov пишет:
> > > Hi Nikolay,
> > >
> > > I think "idle time" is a useful metric, but it can be calculated
> outside of
> > > Ignite using external monitoring system.
> > >
> > > About execution and waiting time, it's not the right way to calculate
> it
> > > using a jobs list. Will jobs list contain only active jobs? In this
> case,
> > > you can't calculate these metrics at all, since you don't know the
> time of
> > > finished jobs. If the list will contain all jobs (will it be
> unlimited?),
> > > iterating over this list will be resource consuming. In any way, it's
> much
> > > simpler (and sometimes only possible) for an external monitoring
> system to
> > > just get some scalar metric than iterate over a list with some
> condition.
> > >
> > > About aggregation, yes, in an ideal world aggregation should be done
> with
> > > the external monitoring system. But, what if a user doesn't use any
> > > external monitoring system and wants to know the health of Ignite
> instance?
> > > Do we have any plans to implement some simple aggregator and ship it
> with
> > > Ignite? Do we have plans to provide some presets for Ignite monitoring
> for
> > > popular monitoring systems? (These questions not related to this PR,
> but
> > > related to IEP at all)
> > >
> > > Also, some aggregation metrics ("max" for example) can't be effectively
> > > calculated using the external system (you should iterate over a jobs
> list
> > > again and still precision of such calculation will be no more than the
> time
> > > between probes).
>

Re: [IEP-35] GridJobProcessorMetrics migration

Reply via email to