ASF GitHub Bot commented on CASSANDRA-13740:

GitHub user jaydeepkumar1984 opened a pull request:


    CASSANDRA-13740 orphan hint file get created


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jaydeepkumar1984/cassandra 13740-3.0

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #136
commit 993d5891e69f5cda402ae2158e06914f653a644d
Author: Jaydeepkumar Chovatia <chovatia.jayd...@gmail.com>
Date:   2017-08-03T22:34:26Z

    CASSANDRA-13740 orphan hint file get created


> Orphan hint file gets created while node is being removed from cluster
> ----------------------------------------------------------------------
>                 Key: CASSANDRA-13740
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13740
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Minor
>             Fix For: 3.0.x
>         Attachments: 13740-2_3.0.15.txt, 13740-3.0.15.txt, gossip_hang_test.py
> I have found this new issue during my test, whenever node is being removed 
> then hint file for that node gets written and stays inside the hint directory 
> forever. I debugged the code and found that it is due to the race condition 
> between [HintsWriteExecutor.java::flush | 
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195]
>  and [HintsWriteExecutor.java::closeWriter | 
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L106]
> . 
> *Time t1* Node is down, as a result Hints are being written by 
> [HintsWriteExecutor.java::flush | 
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L195]
> *Time t2* Node is removed from cluster as a result it calls 
> [HintsService.java-exciseStore | 
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L327]
>  which removes hint files for the node being removed
> *Time t3* Mutation stage keeps pumping Hints through [HintService.java::write 
> | 
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L145]
>  which again calls [HintsWriteExecutor.java::flush | 
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215]
>  and new orphan file gets created
> I was writing a new dtest for {CASSANDRA-13562, CASSANDRA-13308} and that 
> helped me reproduce this new bug. I will submit patch for this new dtest 
> later.
> I also tried following to check how this orphan hint file responds:
> 1. I tried {{nodetool truncatehints <node>}} but it fails as node is no 
> longer part of the ring
> 2. I then tried {{nodetool truncatehints}}, that still doesn’t remove hint 
> file because it is not yet included in the [dispatchDequeue | 
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsStore.java#L53]
> Reproducible steps:
> Please find dTest python file {{gossip_hang_test.py}} attached which 
> reproduces this bug.
> Solution:
> This is due to race condition as mentioned above. Since 
> {{HintsWriteExecutor.java}} creates thread pool with only 1 worker, so 
> solution becomes little simple. Whenever we [HintService.java::excise | 
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsService.java#L303]
>  a host, just store it in-memory, and check for already evicted host inside 
> [HintsWriteExecutor.java::flush | 
> https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/hints/HintsWriteExecutor.java#L215].
>  If already evicted host is found then ignore hints.
> Jaydeep

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to