[ 
https://issues.apache.org/jira/browse/CASSANDRA-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-7390:
---------------------------------------

    Attachment: 7390_lockpendingcalc.txt
                7390_conservative_v1.txt

Looks like I opened Pandora's Box on this one.  A couple patches:

7390_conservative_v1.txt: cleans up a few underlying issues causing failures in 
MoveTest.  The logic on PendingRangeCalculatorService.blockUntilFinished() 
relied on ThreadPoolExecutor.getActiveCount() to determine the # of running 
threads - checking the documentation shows:
{code:title=getActiveCount}
/**
* Returns the approximate number of threads that are actively
* executing tasks.
*
* @return the number of threads
*/
public int getActiveCount() {...}
{code}

Changing that to an AtomicBoolean sentinel and getting rid of the now 
unnecessary DiscardPolicy for extra threads took care of one race.

Also cleaned up the inter-test StorageService / TokenMetadata reset logic in 
MoveTest.java, added some missing fields to TokenMetadata.resetUnsafe(), and 
added another blockUntilFinished between a couple of back-to-back onChange 
calls in testSimultaneousMove() that would infrequently race on 
PendingRangeCalculatorService.update() calls.

With the above changes, unit test failure rate drops from 1 out of 5 runs to 1 
out of 100 on average.

I'm a little wary of what that back-to-back onChange call seems to imply about 
the design of the PendingRangeCalculatorService's queue'ing model - it looks 
like we can have a race where we hit the PendingRangeCalculatorService for an 
update while another one is already running, drop the new update() request with 
the expectation that it'll be caught by the PendingRangeTask that's in-flight, 
and have our logical update missed until the next update() call.  This may just 
be an artifact of unit-tests hitting onChange back-to-back being 
non-representative of real use-cases timing.

7390_lockpendingcalc.txt: As to the final 1% failure - from my initial 
inspection of TokenMetadata code and flow, it looks like we're pulling out 
values from internal data structures in multiple discrete atomic steps in 
calculatePendingRanges. This leaves scheduling holes where other threads 
(variety of onChange paths, some nodetool commands, and brute-force unit-tests) 
can hop in, grab the write lock, and make changes to TokenMetadata while our 
pending calc is active:
{code:title=2 discrete read locks example}
BiMultiValMap<Token, InetAddress> bootstrapTokens = tm.getBootstrapTokens();
// No lock held here
Set<InetAddress> leavingEndpoints = tm.getLeavingEndpoints();
{code}

This patch migrates the pending range calculation inside TokenMetadata while 
holding a readlock for its entirety and addresses the potential race that 
occurs there (0 failures in > 1200 runs) - it also holds the readlock on 
TokenMetadata for a pretty long chunk of time (8-9000 usec on a trivial 
unit-test case on a laptop).  That being said, reads will still operate during 
these time windows and the infrequency of writes blocking / showing this race 
indicates to me that we shouldn't have serious contention with this change.  
Attaching a separate patch for moving pending range calc to TokenMetadata as 
that's more invasive and might just be a race we're comfortable accepting to 
keep contention down in the structure.

> MoveTest fails intermittently
> -----------------------------
>
>                 Key: CASSANDRA-7390
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7390
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joshua McKenzie
>            Assignee: Joshua McKenzie
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: 7390_conservative_v1.txt, 7390_lockpendingcalc.txt
>
>
> Reproduce with:
> for i in `seq 1 50`; do runTest MoveTest >> moveTestFailures.txt;done
> Looks to fail roughly once out of every 5 runs or so
> {code:title=failure}
> [junit] Testcase: 
> newTestWriteEndpointsDuringMove(org.apache.cassandra.service.MoveTest):   
> FAILED
> [junit] mismatched number of moved token expected:<0> but was:<1>
> [junit] junit.framework.AssertionFailedError: mismatched number of moved 
> token expected:<0> but was:<1>
> [junit]     at 
> org.apache.cassandra.service.MoveTest.newTestWriteEndpointsDuringMove(MoveTest.java:140)
> {code}
> Edit: had this as Windows-only - turns out it's failing on on *nix as well



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to