[
https://issues.apache.org/jira/browse/CASSANDRA-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117772#comment-13117772
]
Jackson Chung commented on CASSANDRA-3273:
------------------------------------------
smoke test only: so far so good
i have a node (6-node cluster) that was down for a LONG time (700 PHI), then
start that node for about 30 sec before stopping it
ring shows that node is down in about 20-30secs, gives or takes
{noformat}
TRACE [GossipTasks:1] 2011-09-30 00:14:58,727 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 703.9568334429565
TRACE [GossipTasks:1] 2011-09-30 00:14:58,727 FailureDetector.java (line 160)
notifying listeners that /10.40.22.186 is down
TRACE [GossipTasks:1] 2011-09-30 00:14:58,727 FailureDetector.java (line 161)
intervals: 1027.0 1904.0 2153.0 951.0 215.0 1788.0 1002.0 1002.0 895.0 1133.0
1869.0 mean: 1267.1818181818182
DEBUG [GossipStage:1] 2011-09-30 00:14:58,728 Gossiper.java (line 661) Clearing
interval times for /10.40.22.186 due to generation change
DEBUG [GossipStage:1] 2011-09-30 00:14:58,728 FailureDetector.java (line 242)
Ignoring interval time of 2054002.0
TRACE [GossipTasks:1] 2011-09-30 00:14:59,729 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.0
TRACE [GossipTasks:1] 2011-09-30 00:15:00,730 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.43429448190325176
TRACE [GossipTasks:1] 2011-09-30 00:15:01,732 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.8690228244277856
TRACE [GossipTasks:1] 2011-09-30 00:15:02,733 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.2890479080886867
TRACE [GossipTasks:1] 2011-09-30 00:15:03,734 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.19662520906271305
TRACE [GossipTasks:1] 2011-09-30 00:15:04,735 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.20189636121957935
TRACE [GossipTasks:1] 2011-09-30 00:15:05,737 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.5977870734348798
TRACE [GossipTasks:1] 2011-09-30 00:15:06,738 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.20802340729819624
TRACE [GossipTasks:1] 2011-09-30 00:15:07,739 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.6139326289463335
TRACE [GossipTasks:1] 2011-09-30 00:15:08,740 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.21152308625862737
TRACE [GossipTasks:1] 2011-09-30 00:15:09,741 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.21261773854488178
TRACE [GossipTasks:1] 2011-09-30 00:15:10,743 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.6270982327510521
TRACE [GossipTasks:1] 2011-09-30 00:15:11,744 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.1968065146773795
TRACE [GossipTasks:1] 2011-09-30 00:15:12,745 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.579337235438655
TRACE [GossipTasks:1] 2011-09-30 00:15:13,746 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.37217274142982526
TRACE [GossipTasks:1] 2011-09-30 00:15:14,747 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.7443454828596505
TRACE [GossipTasks:1] 2011-09-30 00:15:15,757 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.3555955505071756
TRACE [GossipTasks:1] 2011-09-30 00:15:16,758 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.7083717111193488
TRACE [GossipTasks:1] 2011-09-30 00:15:17,759 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 1.061147871731522
TRACE [GossipTasks:1] 2011-09-30 00:15:18,760 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.3194684082936909
TRACE [GossipTasks:1] 2011-09-30 00:15:19,762 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.6395757534039692
TRACE [GossipTasks:1] 2011-09-30 00:15:20,763 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 0.9593636301059537
TRACE [GossipTasks:1] 2011-09-30 00:15:21,764 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 1.2791515068079384
TRACE [GossipTasks:1] 2011-09-30 00:15:22,765 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 1.598939383509923
TRACE [GossipTasks:1] 2011-09-30 00:15:23,767 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 1.919046728620201
TRACE [GossipTasks:1] 2011-09-30 00:15:24,768 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 2.238834605322186
TRACE [GossipTasks:1] 2011-09-30 00:15:25,769 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 2.5586224820241705
TRACE [GossipTasks:1] 2011-09-30 00:15:26,771 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 2.8787298271344484
TRACE [GossipTasks:1] 2011-09-30 00:15:27,772 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 3.198517703836433
TRACE [GossipTasks:1] 2011-09-30 00:15:28,773 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 3.518305580538418
TRACE [GossipTasks:1] 2011-09-30 00:15:29,774 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 3.838093457240402
TRACE [GossipTasks:1] 2011-09-30 00:15:30,776 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 4.158200802350681
TRACE [GossipTasks:1] 2011-09-30 00:15:31,777 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 4.4779886790526655
TRACE [GossipTasks:1] 2011-09-30 00:15:32,778 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 4.79777655575465
TRACE [GossipTasks:1] 2011-09-30 00:15:33,779 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 5.117564432456635
TRACE [GossipTasks:1] 2011-09-30 00:15:34,781 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 5.437671777566913
TRACE [GossipTasks:1] 2011-09-30 00:15:35,782 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 5.757459654268897
TRACE [GossipTasks:1] 2011-09-30 00:15:36,783 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 6.077247530970882
TRACE [GossipTasks:1] 2011-09-30 00:15:37,784 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 6.397035407672866
TRACE [GossipTasks:1] 2011-09-30 00:15:38,785 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 6.7168232843748505
TRACE [GossipTasks:1] 2011-09-30 00:15:39,786 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 7.036611161076836
TRACE [GossipTasks:1] 2011-09-30 00:15:40,788 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 7.356718506187114
TRACE [GossipTasks:1] 2011-09-30 00:15:41,789 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 7.676506382889099
TRACE [GossipTasks:1] 2011-09-30 00:15:42,790 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 7.996294259591083
TRACE [GossipTasks:1] 2011-09-30 00:15:43,791 FailureDetector.java (line 156)
PHI for /10.40.22.186 : 8.316082136293067
TRACE [GossipTasks:1] 2011-09-30 00:15:43,792 FailureDetector.java (line 160)
notifying listeners that /10.40.22.186 is down
TRACE [GossipTasks:1] 2011-09-30 00:15:43,792 FailureDetector.java (line 161)
intervals: 1001.0 2004.0 1011.0 481.0 999.0 1514.0 487.0 1551.0 450.0 1001.0
2002.0 1516.0 2003.0 3012.0 mean: 1359.4285714285713
{noformat}
> FailureDetector can take a very long time to mark a host down
> -------------------------------------------------------------
>
> Key: CASSANDRA-3273
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3273
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Fix For: 0.8.7
>
> Attachments: 3273.txt
>
>
> There are two ways to trigger this:
> * Bring a node up very briefly in a mixed-version cluster and then terminate
> it
> * Bring a node up, terminate it for a very long time, then bring it back up
> and take it down again
> In the first case, what can happen is a very short interval arrival time is
> recorded by the versioning logic which requires reconnecting and can happen
> very quickly. This can easily be solved by rejecting any intervals within a
> reasonable bound, for instance the gossiper interval.
> The second instance is harder to solve, because what is happening is that an
> extremely large interval is recorded, which is the time the node was left
> dead the first time. This throws off the mean of the intervals and causes it
> to take a much longer time than it should to mark it down the second time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira