[
https://issues.apache.org/jira/browse/CASSANDRA-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908612#action_12908612
]
Dan Retzlaff commented on CASSANDRA-1494:
-----------------------------------------
Okay. I'd suggest at least following the removeEndpoint() call with a "break"
at least on aesthetic grounds, since otherwise that for loop will cause a
ConcurrentModificationException every time.
> Gossiper ConcurrentModificationException after Decommissioning
> --------------------------------------------------------------
>
> Key: CASSANDRA-1494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1494
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.6.5
> Environment: Linux 2.6.33.8-149.fc13.x86_64 #1 SMP Tue Aug 17
> 22:53:15 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
> Reporter: Dan Retzlaff
>
> After decommissioning 192.168.2.147, the Gossiper caused a
> ConcurrentModificationException in 192.168.2.55. This cascaded into
> 192.168.2.55 thinking that 192.168.2.148 and 192.168.2.149 repeatedly went UP
> and then DOWN. Eventually this left so many intranode (storage port) TCP
> connections in CLOSE_WAIT that other nodes started failing with "too many
> open files" exceptions.
> INFO [Timer-0] 2010-09-08 17:00:02,398 Gossiper.java (line 402) FatClient
> /192.168.2.147 has been silent for 3600000ms, removing from gossip
> ERROR [Timer-0] 2010-09-08 17:00:02,418 Gossiper.java (line 99) Gossip error
> java.util.ConcurrentModificationException
> at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
> at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:383)
> at
> org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:93)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> INFO [Timer-0] 2010-09-08 17:00:12,398 Gossiper.java (line 180) InetAddress
> /192.168.2.148 is now dead.
> INFO [Timer-0] 2010-09-08 17:00:14,399 Gossiper.java (line 180) InetAddress
> /192.168.2.149 is now dead.
> INFO [GMFD:1] 2010-09-08 17:00:19,400 Gossiper.java (line 578) InetAddress
> /192.168.2.149 is now UP
> INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:19,400
> HintedHandOffManager.java (line 165) Started hinted handoff for endPoint
> /192.168.2.149
> INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:19,401
> HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to
> endpoint /192.168.2.149
> INFO [Timer-0] 2010-09-08 17:00:20,399 Gossiper.java (line 180) InetAddress
> /192.168.2.149 is now dead.
> INFO [GMFD:1] 2010-09-08 17:00:43,409 Gossiper.java (line 578) InetAddress
> /192.168.2.148 is now UP
> INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:43,409
> HintedHandOffManager.java (line 165) Started hinted handoff for endPoint
> /192.168.2.148
> INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:43,410
> HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to
> endpoint /192.168.2.148
> INFO [Timer-0] 2010-09-08 17:00:44,404 Gossiper.java (line 180) InetAddress
> /192.168.2.148 is now dead.
> INFO [GMFD:1] 2010-09-08 17:01:18,415 Gossiper.java (line 578) InetAddress
> /192.168.2.149 is now UP
> (UP/DOWN cycle repeats until the target node *really* goes DOWN due to too
> many TCP sockets in CLOSE_WAIT.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.