[jira] [Commented] (HELIX-753) Record top state handoff finished in single cluster data cache refresh

Hudson (JIRA) Thu, 25 Oct 2018 16:06:46 -0700


    [ 
https://issues.apache.org/jira/browse/HELIX-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664402#comment-16664402
 ]


Hudson commented on HELIX-753:
------------------------------

FAILURE: Integrated in Jenkins build helix #1545 (See 
[https://builds.apache.org/job/helix/1545/])
[HELIX-753] Record top state handoff finished in single cluster data (hrzhang: 
rev 67ff66b4897309c785b8b42863e95734eba81aab)
* (edit) 
helix-core/src/test/java/org/apache/helix/monitoring/mbeans/TestTopStateHandoffMetrics.java
* (edit) 
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
* (edit) 
helix-core/src/main/java/org/apache/helix/controller/stages/ClusterEvent.java
* (edit) 
helix-core/src/main/java/org/apache/helix/controller/stages/CurrentStateComputationStage.java
* (edit) helix-core/src/test/resources/TestTopStateHandoffMetrics.json
* (edit) 
helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java


> Record top state handoff finished in single cluster data cache refresh
> ----------------------------------------------------------------------
>
>                 Key: HELIX-753
>                 URL: https://issues.apache.org/jira/browse/HELIX-753
>             Project: Apache Helix
>          Issue Type: Bug
>            Reporter: Harry Zhang
>            Assignee: Harry Zhang
>            Priority: Major
>
> Currently we are calculating top state handoff duration by doing the 
> following:
>  - record missing top state when we see a top state missing
>  - record top state come back when we see it come back
>  - report top state handoff duration
> This is perfectly fine for non-P2P state transitions as the entire top state 
> handoff process will always finish for >= 2 pipeline runs. However, for P2P 
> enabled clusters, top state handoff are quick, and if it is quicker than 
> cluster data refresh stage latency, we will lose a lot of short top state 
> handoffs, which make the number miserable on ingraph.
> We need to revise top state handoff metrics implementation so we don't lose 
> data point statistically (i.e. we are losing all short handoffs now).
> AC:
>  - revise impl so we catch those short top state hand-offs
>  - write new tests to catch the fix if needed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HELIX-753) Record top state handoff finished in single cluster data cache refresh

Reply via email to