[
https://issues.apache.org/jira/browse/IMPALA-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036547#comment-17036547
]
ASF subversion and git services commented on IMPALA-9340:
---------------------------------------------------------
Commit aca2215c358cbbdc5d460200bc4f78c793d56ffc in impala's branch
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=aca2215 ]
IMPALA-9340: fix bug where max missed heartbeats is off by one
Max missed heartbeats is off by one due to greater than sign ('>')
used in comparison against statestore_max_missed_heartbeats flag in
failure-detector.cc. This commit change the sign to greater than or
equal ('>=').
Testing:
* Manual test by running impala mini cluster, kill one of impalad, and
verify in statestored log that the killed impalad is declared as
failed exactly at statestore_max_missed_heartbeats
* Run and pass core test
Change-Id: I19f6bfa7e08d231896665d85299302a17959fb6f
Reviewed-on: http://gerrit.cloudera.org:8080/15201
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> statestore_max_missed_heartbeats is off by one
> ----------------------------------------------
>
> Key: IMPALA-9340
> URL: https://issues.apache.org/jira/browse/IMPALA-9340
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Sahil Takiar
> Assignee: Riza Suminto
> Priority: Minor
> Labels: newbie, ramp-up
> Fix For: Impala 3.4.0
>
>
> The flag {{statestore_max_missed_heartbeats}} says:
> {quote}Maximum number of consecutiveĀ heartbeat messages an impalad can miss
> before being declared failed by theĀ statestore.
> {quote}
> However, the implementation actually waits for
> {{statestore_max_missed_heartbeats}} + 1 missed heartbeats before considering
> the impalad as failed.
> Example when {{statestore_max_missed_heartbeats}} is set to 10 (the default
> value):
> {code:java}
> logs/custom_cluster_tests/statestored.impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com.jenkins.log.INFO.20200128-105531.29877:I0128
> 10:58:04.214053 29932 failure-detector.cc:90] 1 consecutive heartbeats
> failed for
> 'impa...@impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com:22002'.
> State is OK
> logs/custom_cluster_tests/statestored.impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com.jenkins.log.INFO.20200128-105531.29877:I0128
> 10:58:04.267143 29937 failure-detector.cc:90] 2 consecutive heartbeats
> failed for
> 'impa...@impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com:22002'.
> State is OK
> logs/custom_cluster_tests/statestored.impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com.jenkins.log.INFO.20200128-105531.29877:I0128
> 10:58:04.320443 29938 failure-detector.cc:90] 3 consecutive heartbeats
> failed for
> 'impa...@impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com:22002'.
> State is OK
> logs/custom_cluster_tests/statestored.impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com.jenkins.log.INFO.20200128-105531.29877:I0128
> 10:58:04.373548 29934 failure-detector.cc:90] 4 consecutive heartbeats
> failed for
> 'impa...@impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com:22002'.
> State is OK
> logs/custom_cluster_tests/statestored.impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com.jenkins.log.INFO.20200128-105531.29877:I0128
> 10:58:04.426955 29929 failure-detector.cc:90] 5 consecutive heartbeats
> failed for
> 'impa...@impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com:22002'.
> State is OK
> logs/custom_cluster_tests/statestored.impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com.jenkins.log.INFO.20200128-105531.29877:I0128
> 10:58:04.479981 29933 failure-detector.cc:90] 6 consecutive heartbeats
> failed for
> 'impa...@impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com:22002'.
> State is SUSPECTED
> logs/custom_cluster_tests/statestored.impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com.jenkins.log.INFO.20200128-105531.29877:I0128
> 10:58:04.533097 29930 failure-detector.cc:90] 7 consecutive heartbeats
> failed for
> 'impa...@impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com:22002'.
> State is SUSPECTED
> logs/custom_cluster_tests/statestored.impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com.jenkins.log.INFO.20200128-105531.29877:I0128
> 10:58:04.586172 29934 failure-detector.cc:90] 8 consecutive heartbeats
> failed for
> 'impa...@impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com:22002'.
> State is SUSPECTED
> logs/custom_cluster_tests/statestored.impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com.jenkins.log.INFO.20200128-105531.29877:I0128
> 10:58:04.639999 29936 failure-detector.cc:90] 9 consecutive heartbeats
> failed for
> 'impa...@impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com:22002'.
> State is SUSPECTED
> logs/custom_cluster_tests/statestored.impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com.jenkins.log.INFO.20200128-105531.29877:I0128
> 10:58:04.692075 29929 failure-detector.cc:90] 10 consecutive heartbeats
> failed for
> 'impa...@impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com:22002'.
> State is SUSPECTED
> logs/custom_cluster_tests/statestored.impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com.jenkins.log.INFO.20200128-105531.29877:I0128
> 10:58:04.745105 29931 failure-detector.cc:90] 11 consecutive heartbeats
> failed for
> 'impa...@impala-ec2-centos74-m5-4xlarge-ondemand-09f9.vpc.cloudera.com:22002'.
> State is FAILED {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]