Zhen Zhang created HELIX-444:
--------------------------------

             Summary: add per-participant partition count gauges to helix
                 Key: HELIX-444
                 URL: https://issues.apache.org/jira/browse/HELIX-444
             Project: Apache Helix
          Issue Type: Improvement
            Reporter: Zhen Zhang
            Assignee: Zhen Zhang


We need a way to pull the known down partition counts out of 
DifferenceWithIdealState when an instance is offline, reducing the alert volume 
to solely the down instance notification. Without metrics from helix indicating 
the number of partitions hosted on a given participant, we can't reason as to 
which "DifferenceWithIdealState" counts are supposed to be down and which are 
an actually difference caused by something other than a node outage.
These should be produced on a per-participant, per-resource basis (ie., 
helix.i001.participantstatus.ESPRESSO_USCP.ela4-app1234.prod.linkedin.com_11932.ucpx.partitiongauge
 = 64 or whatever)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to