[jira] [Created] (HELIX-753) record top state handoff finished in single cluster data cache refresh

Harry Zhang (JIRA) Fri, 21 Sep 2018 14:30:46 -0700

Harry Zhang created HELIX-753:
---------------------------------

             Summary: record top state handoff finished in single cluster data 
cache refresh
                 Key: HELIX-753
                 URL: https://issues.apache.org/jira/browse/HELIX-753
             Project: Apache Helix
          Issue Type: Bug
            Reporter: Harry Zhang
            Assignee: Harry Zhang



Currently we are calculating top state handoff duration by doing the following:
 - record missing top state when we see a top state missing
 - record top state come back when we see it come back
 - report top state handoff duration

This is perfectly fine for non-P2P state transitions as the entire top state 
handoff process will always finish for >= 2 pipeline runs. However, for P2P 
enabled clusters, top state handoff are quick, and if it is quicker than 
cluster data refresh stage latency, we will lose a lot of short top state 
handoffs, which make the number miserable on ingraph.

We need to revise top state handoff metrics implementation so we don't lose 
data point statistically (i.e. we are losing all short handoffs now).

AC:
 - revise impl so we catch those short top state hand-offs
 - write new tests to catch the fix if needed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HELIX-753) record top state handoff finished in single cluster data cache refresh

Reply via email to