Nikolay,

Looks like final resolution. +1.

пт, 26 июл. 2019 г. в 12:08, Nikolay Izhikov <nizhi...@apache.org>:

> Pavel.
>
> > I just want to add that currentPmeTime is also useful alerting systems,
> not
> > only for eye observing
>
> Fully agree.
>
> Let me make it as clear as I can.
> In the end we should have 4 metrics:
>
> `CurrentPMEDuration` - existing metric, shows current PME duration.
> `CurrentPMECacheOperationsBlockedDuration` - new long metric. show
> blocking duration of PME.
>
> `PMEDuration` - histogram of full PME durations.
> `PMECacheOperationsBlockedDuration` - histogram of blocking PME durations.
>
> В Чт, 25/07/2019 в 22:40 +0300, Pavel Kovalenko пишет:
> > Nikolay,
> >
> > Okay, sounds reasonable.
> > I just want to add that currentPmeTime is also useful alerting systems,
> not
> > only for eye observing. If the time become too long and exceeds some
> > threshold appropriate alert firing can help to early determine a critical
> > problem.
> >
> > On Thu, 25 Jul 2019 at 21.12, Nikolay Izhikov <nizhi...@apache.org>
> wrote:
> >
> > > I think exact time should be obtained from logs, isnt it?
> > >
> > >
> > > чт, 25 июля 2019 г., 20:00 Pavel Kovalenko <jokse...@gmail.com>:
> > >
> > > > Nikolay,
> > > >
> > > > Yes, I have a chance to see HistogramMetric and moreover reviewed
> it) My
> > > > question was mostly about what exactly we will track in Histogram.
> > > > If we use histogram do you know how we can find exact time e.g. when
> PME
> > > > with time > 1s happened?
> > > >
> > > > чт, 25 июл. 2019 г. в 19:24, Nikolay Izhikov <nizhi...@apache.org>:
> > > >
> > > > > Pavel
> > > > >
> > > > > Do you have a chance to see HistogramMetric source?
> > > > > It in master now.
> > > > > Look in source would be better then my explanation)
> > > > >
> > > > > We should count PME processes that blocks operations for some
> amount of
> > > > > time. For example [less then 50, less then 250, less then 1000,
> more
> > >
> > > then
> > > > > 1000] millis.
> > > > >
> > > > > чт, 25 июля 2019 г., 18:55 Pavel Kovalenko <jokse...@gmail.com>:
> > > > >
> > > > > > Nikolay,
> > > > > >
> > > > > > Could you please explain deeper what structure will be of PME
> > > >
> > > > histogram?
> > > > > >
> > > > > > чт, 25 июл. 2019 г. в 11:56, Nikolay Izhikov <
> nizhi...@apache.org>:
> > > > > >
> > > > > > > Hello, Nikita.
> > > > > > >
> > > > > > > I think
> > > > > > >
> > > > > > > > 1. The totalCacheOperationsBlockedDuration metric that will
> > > > >
> > > > > accumulate
> > > > > > > > all blocking durations that happen after node starts.
> > > > > > >
> > > > > > > No, we don't need it.
> > > > > > >
> > > > > > > > 2. Blocking duration histogram. Based on the HistogramMetric
> > >
> > > class.
> > > > > > >
> > > > > > > Yes, we need it.
> > > > > > >
> > > > > > > В Чт, 25/07/2019 в 11:50 +0300, Nikita Amelchev пишет:
> > > > > > > > Igniters,
> > > > > > > >
> > > > > > > > All want to see the сacheOperationsBlockedDuration metric
> that
> > >
> > > will
> > > > > > > > show current blocking duration or 0 if there is no blocking
> right
> > > > >
> > > > > now.
> > > > > > > >
> > > > > > > > Do we need the following metrics? It seems one of them will
> be
> > > > > > >
> > > > > > > superfluous.
> > > > > > > > 1. The totalCacheOperationsBlockedDuration metric that will
> > > > >
> > > > > accumulate
> > > > > > > > all blocking durations that happen after node starts.
> > > > > > > > 2. Blocking duration histogram. Based on the HistogramMetric
> > >
> > > class.
> > > > > > > > User will be able to configure bounds.
> > > > > > > >
> > > > > > > > ср, 24 июл. 2019 г. в 18:26, Nikolay Izhikov <
> > >
> > > nizhi...@apache.org
> > > > > :
> > > > > > > > >
> > > > > > > > > Guys.
> > > > > > > > >
> > > > > > > > > I think we should go with the 2 metrics
> > > > > > > > >
> > > > > > > > >         * current PME duration (resets on finish)
> > > > > > > > >
> > > > > > > > >                 This metric required for alerting(or
> automatic
> > > > > > >
> > > > > > > actions) on long PME.
> > > > > > > > >
> > > > > > > > >         * PME duration histogram (value added to metrics
> on PME
> > > > > >
> > > > > > finish)
> > > > > > > > >                 This metric required for an:
> > > > > > > > >                         * Quick PME trend analysis
> > > > > > > > >                         * Quick PME history analysis
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > В Ср, 24/07/2019 в 15:01 +0300, Ivan Rakov пишет:
> > > > > > > > > > Nikita and Maxim,
> > > > > > > > > >
> > > > > > > > > > > What if we just update current metric
> getCurrentPmeDuration
> > > > > > >
> > > > > > > behaviour
> > > > > > > > > > > to show durations only for blocking PMEs?
> > > > > > > > > > > Remain it as a long value and rename it to
> > > > > > >
> > > > > > > getCacheOperationsBlockedDuration.
> > > > > > > > > > >
> > > > > > > > > > > No other changes will require.
> > > > > > > > > > >
> > > > > > > > > > > WDYT?
> > > > > > > > > >
> > > > > > > > > > I agree with these two metrics. I also think that current
> > > > > > > > > > getCurrentPmeDuration will become redundant.
> > > > > > > > > >
> > > > > > > > > > Anton,
> > > > > > > > > >
> > > > > > > > > > > It looks like we're trying to implement "extended
> debug"
> > > > >
> > > > > instead
> > > > > > of
> > > > > > > > > > > "monitoring".
> > > > > > > > > > > It should not be interesting for real admin what phase
> of
> > >
> > > PME
> > > > > is
> > > > > > in
> > > > > > > > > > > progress and so on.
> > > > > > > > > >
> > > > > > > > > > PME is mission critical cluster process. I agree that
> > >
> > > there's a
> > > > > > fine
> > > > > > > > > > line between monitoring and debug here. However, it's not
> > >
> > > good
> > > > to
> > > > > > add
> > > > > > > > > > monitoring capabilities only for scenario when
> everything is
> > > > > >
> > > > > > alright.
> > > > > > > > > > If PME will really hang, *real admin* will be extremely
> > > > >
> > > > > interested
> > > > > > > how
> > > > > > > > > > to return cluster back to working state. Metrics about
> stages
> > > > > > >
> > > > > > > completion
> > > > > > > > > > time may really help here: e.g. if one specific node
> hasn't
> > > > > >
> > > > > > completed
> > > > > > > > > > stage X while rest of the cluster has, it can be a signal
> > >
> > > that
> > > > > this
> > > > > > > node
> > > > > > > > > > should be killed.
> > > > > > > > > >
> > > > > > > > > > Of course, it's possible to build monitoring system that
> > > >
> > > > extract
> > > > > > this
> > > > > > > > > > information from logs, but:
> > > > > > > > > > - It's more resource intensive as it requires parsing
> logs
> > >
> > > for
> > > > > all
> > > > > > > the time
> > > > > > > > > > - It's less reliable as log messages may change
> > > > > > > > > >
> > > > > > > > > > Best Regards,
> > > > > > > > > > Ivan Rakov
> > > > > > > > > >
> > > > > > > > > > On 24.07.2019 14:57, Maxim Muzafarov wrote:
> > > > > > > > > > > Folks,
> > > > > > > > > > >
> > > > > > > > > > > +1 with Anton post.
> > > > > > > > > > >
> > > > > > > > > > > What if we just update current metric
> getCurrentPmeDuration
> > > > > > >
> > > > > > > behaviour
> > > > > > > > > > > to show durations only for blocking PMEs?
> > > > > > > > > > > Remain it as a long value and rename it to
> > > > > > >
> > > > > > > getCacheOperationsBlockedDuration.
> > > > > > > > > > >
> > > > > > > > > > > No other changes will require.
> > > > > > > > > > >
> > > > > > > > > > > WDYT?
> > > > > > > > > > >
> > > > > > > > > > > On Wed, 24 Jul 2019 at 14:02, Nikita Amelchev <
> > > > > > >
> > > > > > > nsamelc...@gmail.com> wrote:
> > > > > > > > > > > > Nikolay,
> > > > > > > > > > > >
> > > > > > > > > > > > The сacheOperationsBlockedDuration metric will show
> > >
> > > current
> > > > > > > blocking
> > > > > > > > > > > > duration or 0 if there is no blocking right now.
> > > > > > > > > > > >
> > > > > > > > > > > > The totalCacheOperationsBlockedDuration metric will
> > > > >
> > > > > accumulate
> > > > > > > all
> > > > > > > > > > > > blocking durations that happen after node starts.
> > > > > > > > > > > >
> > > > > > > > > > > > ср, 24 июл. 2019 г. в 13:35, Nikolay Izhikov <
> > > > > > >
> > > > > > > nizhi...@apache.org>:
> > > > > > > > > > > > > Nikita
> > > > > > > > > > > > >
> > > > > > > > > > > > > What is the difference between those two metrics?
> > > > > > > > > > > > >
> > > > > > > > > > > > > ср, 24 июля 2019 г., 12:45 Nikita Amelchev <
> > > > > > >
> > > > > > > nsamelc...@gmail.com>:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Igniters, thanks for comments.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  From the discussion it can be seen that we need
> only
> > > >
> > > > two
> > > > > > > metrics for now:
> > > > > > > > > > > > > > - сacheOperationsBlockedDuration (long)
> > > > > > > > > > > > > > - totalCacheOperationsBlockedDuration (long)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I will prepare PR at the nearest time.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ср, 24 июл. 2019 г. в 09:11, Zhenya Stanilovsky
> > > > > > >
> > > > > > > <arzamas...@mail.ru.invalid
> > > > > > > > > > > > > > > :
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +1 with Anton decisions.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Среда, 24 июля 2019, 8:44 +03:00 от Anton
> > > >
> > > > Vinogradov
> > > > > <
> > > > > > > a...@apache.org>:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Folks,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > It looks like we're trying to implement
> "extended
> > > > > >
> > > > > > debug"
> > > > > > > instead of
> > > > > > > > > > > > > > > > "monitoring".
> > > > > > > > > > > > > > > > It should not be interesting for real admin
> what
> > > > >
> > > > > phase
> > > > > > > of PME is in
> > > > > > > > > > > > > > > > progress and so on.
> > > > > > > > > > > > > > > > Interested metrics are
> > > > > > > > > > > > > > > > - total blocked time (will be used for real
> SLA
> > > > > >
> > > > > > counting)
> > > > > > > > > > > > > > > > - are we blocked right now (shows we have an
> SLA
> > > > > > >
> > > > > > > degradation right now)
> > > > > > > > > > > > > > > > Duration of the current blocking period can
> be
> > > >
> > > > easily
> > > > > > > presented using
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > any
> > > > > > > > > > > > > > > > modern monitoring tool by regular checks.
> > > > > > > > > > > > > > > > Initial true will means "period start",
> precision
> > > > >
> > > > > will
> > > > > > > be a result of
> > > > > > > > > > > > > > > > checks frequency.
> > > > > > > > > > > > > > > > Anyway, I'm ok to have current metric
> presented
> > > >
> > > > with
> > > > > > > long, where long
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > duration, see no reason, but ok :)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > All other features you mentioned are useful
> for
> > > >
> > > > code
> > > > > or
> > > > > > > > > > > > > > > > deployment improving and can (should) be
> taken
> > >
> > > from
> > > > > > logs
> > > > > > > at the analysis
> > > > > > > > > > > > > > > > phase.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Jul 23, 2019 at 7:22 PM Ivan Rakov <
> > > > > > >
> > > > > > > ivan.glu...@gmail.com >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > Folks, let me step in.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Nikita, thanks for your suggestions!
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 1. initialVersion. Topology version that
> > > > >
> > > > > initiates
> > > > > > > the exchange.
> > > > > > > > > > > > > > > > > > 2. initTime. Time PME was started.
> > > > > > > > > > > > > > > > > > 3. initEvent. Event that triggered PME.
> > > > > > > > > > > > > > > > > > 4. partitionReleaseTime. Time when a
> node has
> > > > > > >
> > > > > > > finished waiting for
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > updates and translations on a previous
> > > >
> > > > topology.
> > > > > > > > > > > > > > > > > > 5. sendSingleMessageTime. Time when a
> node
> > > >
> > > > sent a
> > > > > > > single message.
> > > > > > > > > > > > > > > > > > 6. recieveFullMessageTime. Time when a
> node
> > > > > >
> > > > > > received
> > > > > > > a full message.
> > > > > > > > > > > > > > > > > > 7. finishTime. Time PME was ended.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > When new PME started all these metrics
> > >
> > > resets.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Every metric from Nikita's list looks
> useful
> > >
> > > and
> > > > > > > simple to implement.
> > > > > > > > > > > > > > > > > I think that it would be better to change
> > >
> > > format
> > > > of
> > > > > > > metrics 4, 5, 6
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > 7 a bit: we can keep only difference
> between
> > >
> > > time
> > > > > of
> > > > > > > previous event
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > time of corresponding event. Such metrics
> would
> > > >
> > > > be
> > > > > > > easier to perceive:
> > > > > > > > > > > > > > > > > they answer to specific questions "how much
> > >
> > > time
> > > > > did
> > > > > > > partition release
> > > > > > > > > > > > > > > > > take?" or "how much time did awaiting of
> > > > >
> > > > > distributed
> > > > > > > phase end take?".
> > > > > > > > > > > > > > > > > Also, if results of 4, 5, 6, 7 will be
> exported
> > > >
> > > > to
> > > > > > > monitoring system,
> > > > > > > > > > > > > > > > > graphs will show how different stages times
> > > >
> > > > change
> > > > > > > from one PME to
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > another.
> > > > > > > > > > > > > > > > > > When PME cause no blocking, it's a good
> PME
> > > >
> > > > and I
> > > > > > > see no reason to
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > have
> > > > > > > > > > > > > > > > > > monitoring related to it
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Agree with Anton here. These metrics
> should be
> > > > > > >
> > > > > > > measured only for true
> > > > > > > > > > > > > > > > > distributed exchange. Saving results for
> client
> > > > > > >
> > > > > > > leave/join PMEs will
> > > > > > > > > > > > > > > > > just complicate monitoring.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I agree with total blocking duration
> metric
> > >
> > > but
> > > > > > > > > > > > > > > > > > I still don't understand why instant
> value
> > > > > > >
> > > > > > > indicating that
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > operations are
> > > > > > > > > > > > > > > > > > blocked should be boolean.
> > > > > > > > > > > > > > > > > > Duration time since blocking has started
> > >
> > > looks
> > > > > more
> > > > > > > appropriate and
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > useful.
> > > > > > > > > > > > > > > > > > It gives more information while semantic
> is
> > > >
> > > > left
> > > > > > the
> > > > > > > same.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Totally agree with Pavel here. Both
> > >
> > > "accumulated
> > > > > > block
> > > > > > > time" and
> > > > > > > > > > > > > > > > > "current PME block time" metrics are
> useful.
> > > >
> > > > Growth
> > > > > > of
> > > > > > > accumulated
> > > > > > > > > > > > > > > > > metric for specific period of time (should
> be
> > > >
> > > > easy
> > > > > to
> > > > > > > check via
> > > > > > > > > > > > > > > > > monitoring system graph) will show for how
> much
> > > > > > >
> > > > > > > business operations
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > were
> > > > > > > > > > > > > > > > > blocked in total, and non-zero current
> metric
> > > >
> > > > will
> > > > > > > show that we are
> > > > > > > > > > > > > > > > > experiencing issues right now. Boolean
> metric
> > > >
> > > > "are
> > > > > we
> > > > > > > blocked right
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > now"
> > > > > > > > > > > > > > > > > is not needed as it's obviously can be
> inferred
> > > > >
> > > > > from
> > > > > > > "current PME
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > block
> > > > > > > > > > > > > > > > > time".
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best Regards,
> > > > > > > > > > > > > > > > > Ivan Rakov
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On 23.07.2019 16:02, Pavel Kovalenko wrote:
> > > > > > > > > > > > > > > > > > Nikita,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I agree with total blocking duration
> metric
> > >
> > > but
> > > > > > > > > > > > > > > > > > I still don't understand why instant
> value
> > > > > > >
> > > > > > > indicating that
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > operations are
> > > > > > > > > > > > > > > > > > blocked should be boolean.
> > > > > > > > > > > > > > > > > > Duration time since blocking has started
> > >
> > > looks
> > > > > more
> > > > > > > appropriate and
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > useful.
> > > > > > > > > > > > > > > > > > It gives more information while semantic
> is
> > > >
> > > > left
> > > > > > the
> > > > > > > same.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > вт, 23 июл. 2019 г. в 11:42, Nikita
> Amelchev
> > >
> > > <
> > > > > > > nsamelc...@gmail.com
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > :
> > > > > > > > > > > > > > > > > > > Folks,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > All previous suggestions have some
> > > > >
> > > > > disadvantages.
> > > > > > > It can be several
> > > > > > > > > > > > > > > > > > > exchanges between two metric updates
> and
> > >
> > > fast
> > > > > > > exchange can rewrite
> > > > > > > > > > > > > > > > > > > previous long exchange.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > We can introduce a metric of total
> blocking
> > > > > > >
> > > > > > > duration that will
> > > > > > > > > > > > > > > > > > > accumulate at the end of the exchange.
> So,
> > > > >
> > > > > users
> > > > > > > will get actual
> > > > > > > > > > > > > > > > > > > information about how long operations
> were
> > > > > > >
> > > > > > > blocked. Cluster metric
> > > > > > > > > > > > > > > > > > > will be a maximum of local nodes
> metrics.
> > >
> > > And
> > > > > we
> > > > > > > need a boolean
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > metric
> > > > > > > > > > > > > > > > > > > that will indicate realtime status. It
> > >
> > > needs
> > > > > > > because of duration
> > > > > > > > > > > > > > > > > > > metric updates at the end of the
> exchange.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > So I propose to change the current
> metric
> > > >
> > > > that
> > > > > > not
> > > > > > > released to the
> > > > > > > > > > > > > > > > > > > totalCacheOperationsBlockingDuration
> metric
> > > >
> > > > and
> > > > > > to
> > > > > > > add the
> > > > > > > > > > > > > > > > > > > isCacheOperationsBlocked metric.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > WDYT?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > пн, 22 июл. 2019 г. в 09:27, Anton
> > > >
> > > > Vinogradov <
> > > > > > > a...@apache.org >:
> > > > > > > > > > > > > > > > > > > > Nikolay,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Still see no reason to replace
> boolean
> > >
> > > with
> > > > > > long.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Mon, Jul 22, 2019 at 9:19 AM
> Nikolay
> > > > > >
> > > > > > Izhikov <
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > nizhi...@apache.org >
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > Anton.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > 1. Value exported based on SPI
> > >
> > > settings,
> > > > > not
> > > > > > > in the moment it
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > changed.
> > > > > > > > > > > > > > > > > > > > > 2. Clock synchronisation - if we
> export
> > > > >
> > > > > start
> > > > > > > time, we should
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > also
> > > > > > > > > > > > > > > > > > > export
> > > > > > > > > > > > > > > > > > > > > node local timestamp.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > пн, 22 июля 2019 г., 8:33 Anton
> > > >
> > > > Vinogradov
> > > > > <
> > > > > > > a...@apache.org >:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Folks,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > What's the reason for duration
> > > >
> > > > counting?
> > > > > > > > > > > > > > > > > > > > > > AFAIU, it's a monitoring system
> > >
> > > feature
> > > > > to
> > > > > > > count the durations.
> > > > > > > > > > > > > > > > > > > > > > Sine monitoring system checks
> metrics
> > > > > > >
> > > > > > > periodically it will know
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > duration by its own log.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 19, 2019 at 7:32 PM
> Pavel
> > > > > > >
> > > > > > > Kovalenko <
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > jokse...@gmail.com >
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Nikita,
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Yes, I mean duration not
> timestamp.
> > > >
> > > > For
> > > > > > > the metric name, I
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > suggest
> > > > > > > > > > > > > > > > > > > > > > >
> "cacheOperationsBlockingDuration",
> > >
> > > I
> > > > > > think
> > > > > > > it cleaner
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > represents
> > > > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > blocked during PME.
> > > > > > > > > > > > > > > > > > > > > > > We can also combine both
> timestamp
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > "cacheOperationsBlockingStartTs" and
> > > > > > > > > > > > > > > > > > > > > > > duration to have better
> correlation
> > > > >
> > > > > when
> > > > > > > cache operations were
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > blocked
> > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > how much time it's taken.
> > > > > > > > > > > > > > > > > > > > > > > For instant view (like in JMX
> > >
> > > bean) a
> > > > > > > calculated value as you
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > mentioned
> > > > > > > > > > > > > > > > > > > > > > > can be used.
> > > > > > > > > > > > > > > > > > > > > > > For metrics are exported to
> some
> > > > >
> > > > > backend
> > > > > > > (IEP-35) a counter
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > can be
> > > > > > > > > > > > > > > > > > > > > used.
> > > > > > > > > > > > > > > > > > > > > > > The counter is incremented by
> > > >
> > > > blocking
> > > > > > > time after blocking has
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > ended.
> > > > > > > > > > > > > > > > > > > > > > > пт, 19 июл. 2019 г. в 19:10,
> Nikita
> > > > > > >
> > > > > > > Amelchev <
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > nsamelc...@gmail.com
> > > > > > > > > > > > > > > > > > > > :
> > > > > > > > > > > > > > > > > > > > > > > > Pavel,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > The main purpose of this
> metric
> > >
> > > is
> > > > > > > > > > > > > > > > > > > > > > > > > > how much time we wait for
> > > > >
> > > > > resuming
> > > > > > > cache operations
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Seems I misunderstood you.
> Do you
> > > > >
> > > > > mean
> > > > > > > timestamp or duration
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > here?
> > > > > > > > > > > > > > > > > > > > > > > > > > What do you think if we
> > >
> > > change
> > > > > the
> > > > > > > boolean value of metric
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > to a
> > > > > > > > > > > > > > > > > > > > > long
> > > > > > > > > > > > > > > > > > > > > > > > value that represents time in
> > > > > > >
> > > > > > > milliseconds when operations
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > were
> > > > > > > > > > > > > > > > > > > > > blocked?
> > > > > > > > > > > > > > > > > > > > > > > > This time can be calculated
> as
> > > > > > >
> > > > > > > (currentTime -
> > > > > > > > > > > > > > > > > > > > > > > > timeSinceOperationsBlocked)
> in
> > >
> > > case
> > > > > of
> > > > > > > timestamp.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Duration will be more
> > > >
> > > > understandable.
> > > > > > > It'll be something like
> > > > > > > > > > > > > > > > > > > > > > > >
> getCurrentBlockingPmeDuration.
> > >
> > > But
> > > > I
> > > > > > > haven't come up with a
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > > > > > > > > > name yet.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > пт, 19 июл. 2019 г. в 18:30,
> > >
> > > Pavel
> > > > > > > Kovalenko <
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > jokse...@gmail.com
> > > > > > > > > > > > > > > > > > > > :
> > > > > > > > > > > > > > > > > > > > > > > > > Nikita,
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > I think
> getCurrentPmeDuration
> > > > >
> > > > > doesn't
> > > > > > > show useful
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > information.
> > > > > > > > > > > > > > > > > > > The
> > > > > > > > > > > > > > > > > > > > > > main
> > > > > > > > > > > > > > > > > > > > > > > > PME side effect for
> end-users is
> > > > > > >
> > > > > > > blocking cache operations.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Not
> > > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > > PME
> > > > > > > > > > > > > > > > > > > > > > > > time blocks it.
> > > > > > > > > > > > > > > > > > > > > > > > > What information gives to
> an
> > > > >
> > > > > end-user
> > > > > > > timestamp of
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> "timeSinceOperationsBlocked"? For
> > > > >
> > > > > what
> > > > > > > analysis it can be
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > used and
> > > > > > > > > > > > > > > > > > > > > how?
> > > > > > > > > > > > > > > > > > > > > > > > > пт, 19 июл. 2019 г. в
> 17:48,
> > > >
> > > > Nikita
> > > > > > > Amelchev <
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >   nsamelc...@gmail.com
> > > > > > > > > > > > > > > > > > > > > > :
> > > > > > > > > > > > > > > > > > > > > > > > > > Hi Pavel,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > This time already can be
> > > >
> > > > obtained
> > > > > > > from the
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > getCurrentPmeDuration
> > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > new
> isOperationsBlockedByPme
> > > > > >
> > > > > > metrics.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > As an alternative
> solution, I
> > > >
> > > > can
> > > > > > > rework recently added
> > > > > > > > > > > > > > > > > > > > > > > > > > getCurrentPmeDuration
> metric
> > > >
> > > > (not
> > > > > > > released yet). Seems for
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > users it
> > > > > > > > > > > > > > > > > > > > > > > > > > useless in case of
> > >
> > > non-blocking
> > > > > > PME.
> > > > > > > > > > > > > > > > > > > > > > > > > > Lets name it
> > > > > > >
> > > > > > > timeSinceOperationsBlocked. It'll be timestamp
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > > > > blocking started (minimal
> > >
> > > value
> > > > > of
> > > > > > > cluster nodes) and 0 if
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > blocking
> > > > > > > > > > > > > > > > > > > > > > > > > > ends (there is no running
> > >
> > > PME).
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > WDYT?
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > пт, 19 июл. 2019 г. в
> 15:56,
> > > > >
> > > > > Pavel
> > > > > > > Kovalenko <
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >   jokse...@gmail.com >:
> > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Nikita,
> > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for working
> on
> > > >
> > > > this.
> > > > > > > What do you think if we
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > change the
> > > > > > > > > > > > > > > > > > > > > > > > boolean
> > > > > > > > > > > > > > > > > > > > > > > > > > > value of metric to a
> long
> > > >
> > > > value
> > > > > > > that represents time in
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > milliseconds
> > > > > > > > > > > > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > > > > > > > operations were
> blocked?
> > > > > > > > > > > > > > > > > > > > > > > > > > > Since we have not only
> JMX
> > > >
> > > > and
> > > > > > now
> > > > > > > metrics are periodically
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > exported
> > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > > some backend it can
> give a
> > > >
> > > > more
> > > > > > > clear picture of how much
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > time we
> > > > > > > > > > > > > > > > > > > > > > > > wait for
> > > > > > > > > > > > > > > > > > > > > > > > > > > resuming cache
> operations
> > > > >
> > > > > instead
> > > > > > > of instant boolean
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > indicator.
> > > > > > > > > > > > > > > > > > > > > > > > > > > пт, 19 июл. 2019 г. в
> > >
> > > 14:41,
> > > > > > > Nikita Amelchev <
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >   nsamelc...@gmail.com
> > > > > > > > > > > > > > > > > > > > > > > :
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Anton, Nikolay,
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the
> support.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > For now, we have the
> > > > > > >
> > > > > > > getCurrentPmeDuration() metric that
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > > > > > > not
> > > > > > > > > > > > > > > > > > > > > > > > show
> > > > > > > > > > > > > > > > > > > > > > > > > > > > influence on the
> cluster
> > > > > > >
> > > > > > > correctly. PME can be without
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > blocking
> > > > > > > > > > > > > > > > > > > > > > > > > > > > operations. For
> example,
> > > > >
> > > > > client
> > > > > > > node join/leave events.
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > I suggest add new
> metric
> > >
> > > -
> > > > > > > isOperationsBlockedByPme().
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Together,
> > > > > > > > > > > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > > > > > > > > > > > > > metrics will show
> > >
> > > influence
> > > > > of
> > > > > > > the PME on cluster and user
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > operations.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > I have prepared PR
> for
> > >
> > > this
> > > > > > (Bot
> > > > > > > visa is green). [1] Can
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > anyone
> > > > > > > > > > > > > > > > > > > > > > > > take a
> > > > > > > > > > > > > > > > > > > > > > > > > > > > look?
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > >
> > > > > > > https://issues.apache.org/jira/browse/IGNITE-11961
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > вт, 16 июл. 2019 г. в
> > > >
> > > > 14:58,
> > > > > > > Nikolay Izhikov <
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >   nizhi...@apache.org
> > > > > > > > > > > > > > > > > > > > > > > > > :
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think
> administator of
> > > > > >
> > > > > > Ignite
> > > > > > > cluster should be able to
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > monitor
> > > > > > > > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Ignite process,
> including
> > > >
> > > > non
> > > > > > > blocking PME.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > В Вт, 16/07/2019 в
> > >
> > > 14:57
> > > > > > > +0300, Anton Vinogradov пишет:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > BTW,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Found PME metric
> -
> > > > > > >
> > > > > > > getCurrentPmeDuration().
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Seems, it shows
> > >
> > > exactly
> > > > > PME
> > > > > > > time and not so useful
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > > > > > > this.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The goal it so
> show
> > > > >
> > > > > exactly
> > > > > > > blocking period.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > When PME cause no
> > > > >
> > > > > blocking,
> > > > > > > it's a good PME and I see
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > no
> > > > > > > > > > > > > > > > > > > > > > > > reason to have
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > monitoring
> related to
> > > >
> > > > it
> > > > > :)
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 16,
> 2019
> > >
> > > at
> > > > > > 2:50
> > > > > > > PM Nikolay Izhikov <
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >   nizhi...@apache.org >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Anton.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Why do we need
> to
> > > > > >
> > > > > > postpone
> > > > > > > implementation of this
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > metrics?
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For now,
> > > >
> > > > implementation
> > > > > > of
> > > > > > > new metric is very simple.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think we can
> > > > >
> > > > > implement
> > > > > > > this metrics as a single
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > contribution.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > В Вт,
> 16/07/2019 в
> > > > >
> > > > > 13:47
> > > > > > > +0300, Anton Vinogradov
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > пишет:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Nikita,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Looks like
> all we
> > > > >
> > > > > need
> > > > > > > now is a 1 simple metric:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > > > > > > > > > operations
> > > > > > > > > > > > > > > > > > > > > > > > > > > > blocked?
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Just a true
> or
> > > >
> > > > false.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Lest start
> from
> > > >
> > > > this.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > All other
> metrics
> > > >
> > > > can
> > > > > > be
> > > > > > > extracted from logs now
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > > > > > > > > implemented
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > later.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul
> 16,
> > > >
> > > > 2019
> > > > > at
> > > > > > > 12:46 PM Nikolay Izhikov <
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> nizhi...@apache.org >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +1.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Nikita,
> please,
> > > >
> > > > go
> > > > > > > ahead.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > вт, 16 июля
> > >
> > > 2019
> > > > > г.,
> > > > > > > 11:45 Nikita Amelchev <
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >   nsamelc...@gmail.com
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > :
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello,
> > > >
> > > > Igniters.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I
> suggest to
> > > >
> > > > add
> > > > > > > some useful metrics about the
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > partition map
> > > > > > > > > > > > > > > > > > > > > > > > > > > > exchange
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (PME).
> For
> > >
> > > now,
> > > > > the
> > > > > > > duration of PME stages
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > available
> > > > > > > > > > > > > > > > > > > > > > > > only in
> > > > > > > > > > > > > > > > > > > > > > > > > > > > log
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > files
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and
> cannot be
> > > > > > >
> > > > > > > obtained using JMX or other
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > external
> > > > > > > > > > > > > > > > > > > > > > > > tools. [1]
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I made
> the
> > >
> > > list
> > > > > of
> > > > > > > local node metrics that
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > help to
> > > > > > > > > > > > > > > > > > > > > > > > understand
> > > > > > > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > actual
> status
> > > >
> > > > of
> > > > > > > current PME:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1.
> > > > >
> > > > > initialVersion.
> > > > > > > Topology version that
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > initiates
> > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > > > > > exchange.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2.
> initTime.
> > > >
> > > > Time
> > > > > > > PME was started.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 3.
> initEvent.
> > > > >
> > > > > Event
> > > > > > > that triggered PME.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 4.
> > > > > > >
> > > > > > > partitionReleaseTime. Time when a node has
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > finished
> > > > > > > > > > > > > > > > > > > > > > > > waiting
> > > > > > > > > > > > > > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > all
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > updates
> and
> > > > > > >
> > > > > > > translations on a previous
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > topology.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 5.
> > > > > > >
> > > > > > > sendSingleMessageTime. Time when a node
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > sent a
> > > > > > > > > > > > > > > > > > > > > > > > single
> > > > > > > > > > > > > > > > > > > > > > > > > > > > message.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 6.
> > > > > > >
> > > > > > > recieveFullMessageTime. Time when a node
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > received
> > > > > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > > > > full
> > > > > > > > > > > > > > > > > > > > > > > > > > > > message.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 7.
> > >
> > > finishTime.
> > > > > Time
> > > > > > > PME was ended.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > When new
> PME
> > > > > >
> > > > > > started
> > > > > > > all these metrics resets.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > These
> metrics
> > > > >
> > > > > help
> > > > > > > to understand:
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - how
> long
> > >
> > > PME
> > > > > was
> > > > > > > (current or previous).
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - how
> long
> > > > >
> > > > > awaited
> > > > > > > for all updates was
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > completed.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - what
> node
> > > > >
> > > > > blocks
> > > > > > > PME (didn't send a single
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > message)
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - what
> > > >
> > > > triggered
> > > > > > PME.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thoughts?
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > >
> > > > > > > https://issues.apache.org/jira/browse/IGNITE-11961
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best
> wishes,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Amelchev
> > >
> > > Nikita
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Best wishes,
> > > > > > > > > > > > > > > > > > > > > > > > > > > > Amelchev Nikita
> > > > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > > > > > > > > > Best wishes,
> > > > > > > > > > > > > > > > > > > > > > > > > > Amelchev Nikita
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > > > > > > > Best wishes,
> > > > > > > > > > > > > > > > > > > > > > > > Amelchev Nikita
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > > Best wishes,
> > > > > > > > > > > > > > > > > > > Amelchev Nikita
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Zhenya Stanilovsky
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Best wishes,
> > > > > > > > > > > > > > Amelchev Nikita
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Best wishes,
> > > > > > > > > > > > Amelchev Nikita
> > > > > > > >
> > > > > > > >
> > > > > > > >
>

Reply via email to