dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return 
spurious UnavailableException
-----------------------------------------------------------------------------------------------------

                 Key: CASSANDRA-2870
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2870
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.7.0
            Reporter: Jonathan Ellis
            Assignee: Jonathan Ellis
            Priority: Minor
             Fix For: 0.7.8


When Read Repair is off, we want to avoid doing requests to more nodes than 
necessary to satisfy the ConsistencyLevel.  ReadCallback does this here:

{code}
        this.endpoints = repair || resolver instanceof RowRepairResolver
                       ? endpoints
                       : endpoints.subList(0, Math.min(endpoints.size(), 
blockfor)); // min so as to not throw exception until assureSufficient is called
{code}

You can see that it is assuming that the "endpoints" list is sorted in order of 
preferred-ness for the read.

Then the LOCAL_QUORUM code in DatacenterReadCallback checks to see if we have 
enough nodes to do the read:

{code}
        int localEndpoints = 0;
        for (InetAddress endpoint : endpoints)
        {
            if (localdc.equals(snitch.getDatacenter(endpoint)))
                localEndpoints++;
        }

        if (localEndpoints < blockfor)
            throw new UnavailableException();
{code}

So if repair is off (so we truncate our endpoints list) AND dynamic snitch has 
decided that nodes in another DC are to be preferred over local ones, we'll 
throw UE even if all the replicas are healthy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to