Anton, Nikolay, Thanks for the support.
For now, we have the getCurrentPmeDuration() metric that does not show influence on the cluster correctly. PME can be without blocking operations. For example, client node join/leave events. I suggest add new metric - isOperationsBlockedByPme(). Together, these metrics will show influence of the PME on cluster and user operations. I have prepared PR for this (Bot visa is green). [1] Can anyone take a look? [1] https://issues.apache.org/jira/browse/IGNITE-11961 вт, 16 июл. 2019 г. в 14:58, Nikolay Izhikov <[email protected]>: > > I think administator of Ignite cluster should be able to monitor all Ignite > process, including non blocking PME. > > В Вт, 16/07/2019 в 14:57 +0300, Anton Vinogradov пишет: > > BTW, > > Found PME metric - getCurrentPmeDuration(). > > Seems, it shows exactly PME time and not so useful because of this. > > The goal it so show exactly blocking period. > > When PME cause no blocking, it's a good PME and I see no reason to have > > monitoring related to it :) > > > > On Tue, Jul 16, 2019 at 2:50 PM Nikolay Izhikov <[email protected]> wrote: > > > > > Anton. > > > > > > Why do we need to postpone implementation of this metrics? > > > For now, implementation of new metric is very simple. > > > > > > I think we can implement this metrics as a single contribution. > > > > > > В Вт, 16/07/2019 в 13:47 +0300, Anton Vinogradov пишет: > > > > Nikita, > > > > > > > > Looks like all we need now is a 1 simple metric: are operations blocked? > > > > Just a true or false. > > > > Lest start from this. > > > > All other metrics can be extracted from logs now and can be implemented > > > > later. > > > > > > > > On Tue, Jul 16, 2019 at 12:46 PM Nikolay Izhikov <[email protected]> > > > > wrote: > > > > > > > > > +1. > > > > > > > > > > Nikita, please, go ahead. > > > > > > > > > > > > > > > вт, 16 июля 2019 г., 11:45 Nikita Amelchev <[email protected]>: > > > > > > > > > > > Hello, Igniters. > > > > > > > > > > > > I suggest to add some useful metrics about the partition map > > > > > > exchange > > > > > > (PME). For now, the duration of PME stages available only in log > > > > > > files > > > > > > and cannot be obtained using JMX or other external tools. [1] > > > > > > > > > > > > I made the list of local node metrics that help to understand the > > > > > > actual status of current PME: > > > > > > > > > > > > 1. initialVersion. Topology version that initiates the exchange. > > > > > > 2. initTime. Time PME was started. > > > > > > 3. initEvent. Event that triggered PME. > > > > > > 4. partitionReleaseTime. Time when a node has finished waiting for > > > > > > all > > > > > > updates and translations on a previous topology. > > > > > > 5. sendSingleMessageTime. Time when a node sent a single message. > > > > > > 6. recieveFullMessageTime. Time when a node received a full message. > > > > > > 7. finishTime. Time PME was ended. > > > > > > > > > > > > When new PME started all these metrics resets. > > > > > > > > > > > > These metrics help to understand: > > > > > > - how long PME was (current or previous). > > > > > > - how long awaited for all updates was completed. > > > > > > - what node blocks PME (didn't send a single message) > > > > > > - what triggered PME. > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-11961 > > > > > > > > > > > > -- > > > > > > Best wishes, > > > > > > Amelchev Nikita > > > > > > -- Best wishes, Amelchev Nikita
