NetworkTopologyStrategy allows mismatched RF resulting in obscure failures
--------------------------------------------------------------------------

                 Key: CASSANDRA-1831
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1831
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.7.0 rc 1
            Reporter: Peter Schuller


On today's 0.7 branch:

Creating a keyspace like this (not how to do it in production, but that's not 
the point):

   create keyspace MyKeySpace with replication_factor = 2 and 
placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy';

This is accepted by Cassandra in spite of there being no strategy options. 
Describing the keyspace will then give output similar to:

Keyspace: MyKeySpace:
 Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
null

Attempts to write and read respectively gives the errors included at the bottom 
of this comment.

What happens is that the NTS's getReplicationFactor() returns the sum of RF for 
each DC. But lacking any replicate placement options for DC:s, the sum will 
always be 0. The result is that NTS.calculateNaturalEndpoints() yields 0 
endpoints thus triggering the assertion failures apparent in the strack traces.

This was caused by misconfiguration during testing but should be handled 
better. What are people's thoughts on the set of changes that would constitute 
a proper fix?

Is there a reason for NTS to ever conclude that RF is different than that of 
the CF def? If not, I would say that one fix is to make the NTS bail early if 
the calculated RF adding up the DC placements does not match the configured RF 
for the column family. (I'll submit a patch if people agree.)

Beyond that, what else, if anything should be done? Should the creation fail 
due to the RF being inconsistent with strategy options? Is it correct that code 
assumes that naturalEndPoints will never return fewer nodes than RF? It seems 
natural to me that the natural endpoint count should always match RF, unless 
the total number of nodes in the cluster is lacking. But this gets complicated 
with NTS since the requirement is suddenly that you have enough in each DC. 
This probably relates to previous discussions on whether or not to allow an RF 
which is higher than the number of nodes in a cluster.

In this case, we failed hard because we got exactly 0 endpoints and triggered 
assertions. In other cases we might have gotten say 1, in which case we may 
have successfully been able to read and write as if we had a lower RF even 
though the column family RF was set to 2. This seems dangerous.

ERROR [pool-1-thread-2] 2010-12-07 11:18:40,638 Cassandra.java (line
3044) Internal error processing batch_mutate
java.lang.AssertionError: invalid response count 1 for replication factor 0
       at 
org.apache.cassandra.service.WriteResponseHandler.determineBlockFor(WriteResponseHandler.java:98)
       at 
org.apache.cassandra.service.WriteResponseHandler.<init>(WriteResponseHandler.java:48)
       at 
org.apache.cassandra.service.WriteResponseHandler.create(WriteResponseHandler.java:61)
       at 
org.apache.cassandra.locator.AbstractReplicationStrategy.getWriteResponseHandler(AbstractReplicationStrategy.java:125)
       at 
org.apache.cassandra.locator.NetworkTopologyStrategy.getWriteResponseHandler(NetworkTopologyStrategy.java:166)
       at 
org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:114)
       at 
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:446)
       at 
org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:419)
       at 
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3036)
       at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
       at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       at java.lang.Thread.run(Thread.java:662)
ERROR [pool-1-thread-3] 2010-12-07 11:18:50,474 Cassandra.java (line
2876) Internal error processing get_range_slices
java.lang.AssertionError
       at 
org.apache.cassandra.service.RangeSliceResponseResolver.<init>(RangeSliceResponseResolver.java:53)
       at 
org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:450)
       at 
org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:507)
       at 
org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:2868)
       at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
       at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       at java.lang.Thread.run(Thread.java:662)
 INFO [MigrationStage:1] 2010-12-07 11:24:09,220

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to