[
https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jay Zhuang reassigned CASSANDRA-13740:
--------------------------------------
Assignee: Jaydeepkumar Chovatia
> Orphan hint file gets created while node is being removed from cluster
> ----------------------------------------------------------------------
>
> Key: CASSANDRA-13740
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13740
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Jaydeepkumar Chovatia
> Assignee: Jaydeepkumar Chovatia
> Fix For: 3.0.x
>
> Attachments: 13740-3.0.15.txt, gossip_hang_test.py
>
>
> I have found this new issue during my test, whenever node is being removed
> then hint file for that node gets written and stays inside the hint directory
> forever. I debugged the code and found that it is due to the race condition
> between [HintsWriteExecutor.java::flush |
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195]
> and [HintsWriteExecutor.java::closeWriter |
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106]
> .
>
> *Time t1* Node is down, as a result Hints are being written by
> [HintsWriteExecutor.java::flush |
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195]
> *Time t2* Node is removed from cluster as a result it calls
> [HintsService.java-exciseStore |
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327]
> which removes hint files for the node being removed
> *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write
> |
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145]
> which again calls [HintsWriteExecutor.java::flush |
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]
> and new orphan file gets created
> I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that
> helped me reproduce this new bug. I will submit patch for this new dtest
> later.
> I also tried following to check how this orphan hint file responds:
> 1. I tried {{nodetool truncatehints <node>}} but it fails as node is no
> longer part of the ring
> 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint
> file because it is not yet included in the [dispatchDequeue |
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53]
> Reproducible steps:
> Please find dTest python file {{gossip_hang_test.py}} attached which
> reproduces this bug.
> Solution:
> This is due to race condition as mentioned above. Since
> {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so
> solution becomes little simple. Whenever we [HintService.java::excise |
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303]
> a host, just store it in-memory, and check for already evicted host inside
> [HintsWriteExecutor.java::flush |
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215].
> If already evicted host is found then ignore hints.
> Jaydeep
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]