[ https://issues.apache.org/jira/browse/CASSANDRA-13740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksey Yeschenko updated CASSANDRA-13740: ------------------------------------------ Priority: Minor (was: Major) Reviewer: Aleksey Yeschenko > Orphan hint file gets created while node is being removed from cluster > ---------------------------------------------------------------------- > > Key: CASSANDRA-13740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13740 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Jaydeepkumar Chovatia > Assignee: Jaydeepkumar Chovatia > Priority: Minor > Fix For: 3.0.x > > Attachments: 13740-3.0.15.txt, gossip_hang_test.py > > > I have found this new issue during my test, whenever node is being removed > then hint file for that node gets written and stays inside the hint directory > forever. I debugged the code and found that it is due to the race condition > between [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > and [HintsWriteExecutor.java::closeWriter | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106] > . > > *Time t1* Node is down, as a result Hints are being written by > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195] > *Time t2* Node is removed from cluster as a result it calls > [HintsService.java-exciseStore | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327] > which removes hint files for the node being removed > *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write > | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145] > which again calls [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215] > and new orphan file gets created > I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that > helped me reproduce this new bug. I will submit patch for this new dtest > later. > I also tried following to check how this orphan hint file responds: > 1. I tried {{nodetool truncatehints <node>}} but it fails as node is no > longer part of the ring > 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint > file because it is not yet included in the [dispatchDequeue | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53] > Reproducible steps: > Please find dTest python file {{gossip_hang_test.py}} attached which > reproduces this bug. > Solution: > This is due to race condition as mentioned above. Since > {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so > solution becomes little simple. Whenever we [HintService.java::excise | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303] > a host, just store it in-memory, and check for already evicted host inside > [HintsWriteExecutor.java::flush | > https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]. > If already evicted host is found then ignore hints. > Jaydeep -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org