Den, doesn't make sense from my point if view. And we create new problem: how should we aggregate this metrics when user requests metrics for cluster group.
On Mon, Jul 24, 2017 at 8:48 PM, Denis Magda <dma...@apache.org> wrote: > Guys, > > What if we calculate it on both sides? The client will keep the total time > needed to complete an operation including network hoops while a server > (primary or backup) will count only local time. > > — > Denis > >> On Jul 17, 2017, at 7:07 AM, Andrey Gura <ag...@apache.org> wrote: >> >> Hi, >> >> I believe that the first solution is better than second because it >> takes into account network communication time. Average time of >> communication between nodes doesn't make sense from my point of view. >> >> So I vote for #1. >> >> On Thu, Jul 13, 2017 at 11:52 PM, Вячеслав Коптилин >> <slava.kopti...@gmail.com> wrote: >>> Hi Experts, >>> >>> I am working on https://issues.apache.org/jira/browse/IGNITE-3495 >>> >>> A few words about this issue: >>> It is about that the process of gathering/updating of cache metrics is >>> inconsistent in some cases. >>> Let's consider the following simple topology which contains only two nodes: >>> first node is a client node and the second is a server. >>> And client node starts requests to the server node, for instance >>> cache.put(), cache.putAll(), cache.get() etc. >>> In that case, metrics which are related to counters (cache hits, cache >>> misses, removals and puts) are calculated on the server side, >>> while time metrics are updated on the client node. >>> >>> I think that both metrics (counters and time) should be calculated on the >>> same node. So, there are two obvious solution: >>> >>> #1 Node that starts some operation is responsible for updating the cache >>> metrics. >>> Pro: >>> - it will allow to get more accurate results of metrics. >>> Contra: >>> - this approach does not work in particular cases. for example, partitioned >>> cache with FULL_ASYNC write synchronization mode. >>> - needs to extend response messages (GridNearAtomicUpdateResponse, >>> GridNearGetResponse etc) >>> in order to provide additional information from remote node: cache hits, >>> number of removal etc. >>> So, it will lead to additional pressure on communication channel. >>> Perhaps, this impact will be small - 4 bytes per message or something like >>> that. >>> - backward incompatibility (this is a consequence of the previous point) >>> >>> #2 Primary node (node that actually executes a request) >>> Pro: >>> - easy to implement >>> - backward compatible >>> Contra: >>> - time metrics will not include the time of communication between nodes, so >>> the results will be less accurate. >>> - perhaps we need to provide additional metric which will allow to get avg >>> time of communication between nodes. >>> >>> Please let me know about your thoughts. >>> Perhaps, both alternatives are not so good... >>> >>> Regards, >>> Slava. >