[
https://issues.apache.org/jira/browse/CASSANDRA-8801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Carl Yeksigian updated CASSANDRA-8801:
--------------------------------------
Attachment: 8801-v2.txt
I was able to bring up a node again after decommissioning; it doesn't seem like
the {{DECOMMISSIONED}} state gets saved to the {{system.local}} table.
The cause is IOErrors from MessagingService while it was trying to close the
socket threads. Wrapping MessagingService in a try block fixed the problem, and
when I restarted, it error'd that the node had been decommissioned, and I was
able to use the {{override_decommission}} flag.
I've attached the change that I made to make it work.
Just one nit otherwise, there is an unnecessary whitespace change in
StorageService.
> Decommissioned nodes are willing to rejoin the cluster if restarted
> -------------------------------------------------------------------
>
> Key: CASSANDRA-8801
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8801
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Eric Stevens
> Assignee: Brandon Williams
> Fix For: 3.0
>
> Attachments: 8801-v2.txt, 8801.txt
>
>
> This issue comes from the Cassandra user group.
> If a node which was successfully decommissioned gets restarted with its data
> directory in tact, it will rejoin the cluster immediately going to {{UN}} and
> beginning to serve client requests.
> This is wrong - the node has consistency issues, having missed any writes
> while it was offline because no hinted handoffs were being kept. And in the
> best case scenario (it's spotted and remediated immediately), near-100%
> overstreaming will still occur.
> Also, whatever reasons the operator had for decommissioning the node would
> presumably still be valid, so this action may threaten cluster stability if
> the node is underpowered or suffering hardware issues.
> But what elevates this to critical is that if the node had been offline
> longer than gc_grace_seconds, it may cause permanent and unrecoverable
> consistency issues due to data resurrection.
> h3. Recommendation:
> A node should remember that it was decommissioned and refuse to rejoin a
> cluster without at least a -Dflag forcing it to.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)