Hi, Nickolay Yes, sure. I've left some comments on GitHub.
пн, 24 июн. 2019 г. в 19:15, Nikolay Izhikov <nizhi...@apache.org>: > Hello, Alex. > > Based on our private discussion I've additionally migrated > `totalExecutionTime` and `totalWaitingTime` counters. > Can you review the PR [1]? > > [1] https://github.com/apache/ignite/pull/6622 > > В Пн, 24/06/2019 в 15:14 +0300, Nikolay Izhikov пишет: > > Hello, Alex. > > > > Thanks for the answer. > > > > 1. I, actually, don't understand your proposal :) > > Can you write it down? > > What numbers should be additionally migrated in this PR? > > Or it's OK for now? > > > > > I think "idle time" is a useful metric > > > > I think "usefulness" or "uselessness" of specific metrics depends on the > questions we can answer with it. > > What questions we can ask about Ignite instance and answer with "idle > time" metric? > > > > > About execution and waiting time , it's not the right way to calculate > it > > > using a jobs list. > > > > Same question here. > > > > What questions we can answer with current numbers? > > > > > Will jobs list contain only active jobs? > > > > All jobs that are scheduled for execution on the node(active + waiting) > should be in the list. > > I try to put more details here, to expose my way of thinking about > metrics and lists: > > > > If you have some issues with the jobs on the node it can be 2 kinds of > issues: > > 1. You are waiting for the results of some job and want to know > why it doesn't execute. > > > > In this case, you should query "jobs list" from Ignite. > > You can get an answer on: > > * What jobs currently executes? > > * How many time your job waiting to be executed? > > > > You can also check "activeJobs", "waitingJobs" metrics > graphics to know changes in the jobs queue during the time. > > Seems, you can predict the start of your job from these > numbers. > > > > 2. You want to understand the lifecycle of some finished(failed > job). > > > > In this case, you should analyze the log of the node. > > It should contain information about time: > > * node recieve job information > > * job added to the queue > > * job started execution > > * job finished(failed) execution. > > > > I don't see questions we can't ask from these sources. > > Do we have such? > > How numbers from current GridJobMetrics can help with these questions? > > > > > > > But, what if a user doesn't use any > > > external monitoring system and wants to know the health of Ignite > instance? > > > > It depends on how we define "health". > > And it's not trivial question :) > > > > > Do we have any plans to implement some simple aggregator and ship it > with Ignite? > > > > I think NO. > > We shouldn't do it. > > > > > Do we have plans to provide some presets for Ignite monitoring for > > > popular monitoring systems? > > > > I think we shouldn't do it. > > Because monitoring presets heavily depends on the usage scenario. > > And it can heavily vary for the Ignite. > > > > > > В Пн, 24/06/2019 в 12:46 +0300, Alex Plehanov пишет: > > > Hi Nikolay, > > > > > > I think "idle time" is a useful metric, but it can be calculated > outside of > > > Ignite using external monitoring system. > > > > > > About execution and waiting time, it's not the right way to calculate > it > > > using a jobs list. Will jobs list contain only active jobs? In this > case, > > > you can't calculate these metrics at all, since you don't know the > time of > > > finished jobs. If the list will contain all jobs (will it be > unlimited?), > > > iterating over this list will be resource consuming. In any way, it's > much > > > simpler (and sometimes only possible) for an external monitoring > system to > > > just get some scalar metric than iterate over a list with some > condition. > > > > > > About aggregation, yes, in an ideal world aggregation should be done > with > > > the external monitoring system. But, what if a user doesn't use any > > > external monitoring system and wants to know the health of Ignite > instance? > > > Do we have any plans to implement some simple aggregator and ship it > with > > > Ignite? Do we have plans to provide some presets for Ignite monitoring > for > > > popular monitoring systems? (These questions not related to this PR, > but > > > related to IEP at all) > > > > > > Also, some aggregation metrics ("max" for example) can't be effectively > > > calculated using the external system (you should iterate over a jobs > list > > > again and still precision of such calculation will be no more than the > time > > > between probes). >