Haoze Wu created CASSANDRA-16603:
------------------------------------

             Summary: Sporadic CQL operation timeout due to unconfigured table
                 Key: CASSANDRA-16603
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16603
             Project: Cassandra
          Issue Type: Bug
          Components: Local/Commit Log
            Reporter: Haoze Wu


    We were doing some systematic testing in Cassandra stable release 3.11.10 
and found the disk I/O delay may cause some issues.

    We start a Cassandra cluster of 3 nodes with the default configuration, in 
node 1, we open the cqlsh shell and type in:
{code:java}
create keyspace ycsb WITH REPLICATION = {'class' : 'SimpleStrategy', 
'replication_factor': 3 };
USE ycsb;
create table usertable (
    y_id varchar primary key,
    field0 varchar,
    field1 varchar,
    field2 varchar,
    field3 varchar,
    field4 varchar,
    field5 varchar,
    field6 varchar,
    field7 varchar,
    field8 varchar,
    field9 varchar);
{code}
    During this process, we inject a single disk I/O delay in node 2’s 
CommitLog#add method (e.g., in Mutation.serializer.serialize).
{code:java}
    /**
     * Add a Mutation to the commit log. If CDC is enabled, this can fail.
     *
     * @param mutation the Mutation to add to the log
     * @throws WriteTimeoutException
     */
    public CommitLogPosition add(Mutation mutation) throws WriteTimeoutException
    {
        assert mutation != null;

        try (DataOutputBuffer dob = DataOutputBuffer.scratchBuffer.get())
        {
            Mutation.serializer.serialize(mutation, dob, 
MessagingService.current_version);
            int size = dob.getLength();
            int totalSize = size + ENTRY_OVERHEAD_SIZE;
            if (totalSize > MAX_MUTATION_SIZE)
            {
                throw new IllegalArgumentException(String.format("Mutation of 
%s is too large for the maximum size of %s",
                                                                 
FBUtilities.prettyPrintMemory(totalSize),
                                                                 
FBUtilities.prettyPrintMemory(MAX_MUTATION_SIZE)));
            }
            // ...
        }
        catch (IOException e)
        {
            throw new FSWriteError(e, 
segmentManager.allocatingFrom().getPath());
        }
    }
{code}
    In the aforementioned cqlsh shell connected to node 1, when it runs the 
command of “create table usertable ...”, it costs more time than normal, and 
finally shows a warning message:
{code:java}
Warning: schema version mismatch detected; check the schema versions of your 
nodes in system.local and system.peers.
{code}
    This behavior seems reasonable because our injection affects node 2.

    When we open another cqlsh shell connecting to node 2, intent to run some 
commands on the created table:
{code:java}
USE ycsb;
{code}
    The cqlsh shell shows an error message:
{code:java}
InvalidRequest: Error from server: code=2200 [Invalid query] 
message="unconfigured table t"
{code}
    It seems reasonable, too, because our injection affects node 2. However, 
when we run a cqlsh shell connecting to node 1 again, and try to run some 
commands like:
{code:java}
USE ycsb;
SELECT * FROM usertable;
{code}
    Most of the time, the result is shown immediately. However, sometimes it 
gets stuck for a few seconds, and shows the error message:
{code:java}
ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting 
for replica nodes' responses] message="Operation timed out - received only 0 
responses." info={'received_responses': 0, 'required_responses': 1, 
'consistency': 'ONE'}
{code}
    A more reliable way to reproduce this failure is using our client based on 
datastax (see [https://github.com/functioner/datastax-cassandra-client]).

    We have dump the jstack for node 2 when the issue happens:
{code:java}
"MutationStage-1" #90 daemon prio=5 os_prio=0 tid=0x00007f858429c880 nid=0x6f23 
waiting on condition [0x00007f856889e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:246)
    at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:593)
    at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:470)
    at org.apache.cassandra.db.Mutation.apply(Mutation.java:250)
    at org.apache.cassandra.db.Mutation.apply(Mutation.java:255)
    at org.apache.cassandra.db.Mutation.apply(Mutation.java:264)
    at 
org.apache.cassandra.cql3.statements.ModificationStatement.executeInternalWithoutCondition(ModificationStatement.java:597)
    at 
org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:577)
    at 
org.apache.cassandra.cql3.QueryProcessor.executeInternal(QueryProcessor.java:315)
    at 
org.apache.cassandra.db.SystemKeyspace.lambda$updatePeerInfo$1(SystemKeyspace.java:757)
    at org.apache.cassandra.db.SystemKeyspace$$Lambda$286/119781138.run(Unknown 
Source)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)
    at java.lang.Thread.run(Thread.java:748)

"MigrationStage:1" #69 daemon prio=5 os_prio=0 tid=0x00007f857c6d1b20 
nid=0x6ecd waiting on condition [0x00007f8568cef000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000005e5e7abd8> (a 
com.google.common.util.concurrent.ListenableFutureTask)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
    at java.util.concurrent.FutureTask.get(FutureTask.java:191)
    at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:438)
    at 
org.apache.cassandra.schema.SchemaKeyspace.lambda$flush$1(SchemaKeyspace.java:310)
    at 
org.apache.cassandra.schema.SchemaKeyspace$$Lambda$263/1985363058.accept(Unknown
 Source)
    at java.lang.Iterable.forEach(Iterable.java:75)
    at org.apache.cassandra.schema.SchemaKeyspace.flush(SchemaKeyspace.java:310)
    at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1398)
    - locked <0x0000000611bd39e0> (a java.lang.Class for 
org.apache.cassandra.schema.SchemaKeyspace)
    at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1380)
    - locked <0x0000000611bd39e0> (a java.lang.Class for 
org.apache.cassandra.schema.SchemaKeyspace)
    at 
org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:51)
    at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
    at 
org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$6/1239807799.run(Unknown
 Source)
    at java.lang.Thread.run(Thread.java:748)

"MemtablePostFlush:1" #22 daemon prio=5 os_prio=0 tid=0x00007f8598b018e0 
nid=0x6e45 waiting on condition [0x00007f856e1ad000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000005e5e7abb8> (a 
java.util.concurrent.CountDownLatch$Sync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
    at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
    at 
org.apache.cassandra.db.ColumnFamilyStore$PostFlush.call(ColumnFamilyStore.java:1019)
    at 
org.apache.cassandra.db.ColumnFamilyStore$PostFlush.call(ColumnFamilyStore.java:1002)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
    at 
org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$6/1239807799.run(Unknown
 Source)
    at java.lang.Thread.run(Thread.java:748)

"MemtableFlushWriter:2" #19 daemon prio=5 os_prio=0 tid=0x00007f8598af9da0 
nid=0x6e42 waiting on condition [0x00007f856e270000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
    at 
org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:280)
    at 
org.apache.cassandra.utils.concurrent.OpOrder$Group.await(OpOrder.java:264)
    at 
org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:431)
    at 
org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1114)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
    at 
org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$6/1239807799.run(Unknown
 Source)
{code}
    We found that the delay in CommitLog#add make another thread 
(DefinitionsUpdateVerbHandler$1.runMayThrow) stuck. In particular, 
`DefinitionsUpdateVerbHandler` is stuck in waiting for `SchemaKeyspace.flush`, 
which in turn waits for the `ColumnFamilyStore.waitForFlushes`.

    We are not sure what is the best place to mitigate the issue. One potential 
fix is to perhaps introduce a timeout limit (default no limit) in the add 
method, e.g., execute the I/O operations in a Future task and call 
waitOnFutures with a timeout in the end. If the timeout is exceeded, a 
WriteTimeoutException would be thrown.

    Any comments and suggestions would be appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to