[ 
https://issues.apache.org/jira/browse/CASSANDRA-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016525#comment-14016525
 ] 

Richard Low commented on CASSANDRA-7144:
----------------------------------------

I have a reproduction. Not sure if it was the cause of our issue but it repros 
an identical situation.

I have a 3 node cluster running on my box. I ran these cql commands:

{code}
cqlsh> create keyspace ks with replication = {'class':'SimpleStrategy', 
'replication_factor':2};
cqlsh> use ks;
cqlsh:ks> create table counter_test (a text primary key, b counter) with 
gc_grace_seconds = 0;
cqlsh:ks> consistency all;
Consistency level set to ALL.
cqlsh:ks> update counter_test set b = b + 1 where a = 'a';
cqlsh:ks> delete from counter_test where a = 'a';
cqlsh:ks> update counter_test set b = b + 1 where a = 'a';
Request did not complete within rpc_timeout.
{code}

And in the logs:

{code}
ERROR [WRITE-/127.0.0.2] 2014-06-03 14:37:24,186 CassandraDaemon.java (line 
196) Exception in thread Thread[WRITE-/127.0.0.2,5,main]
java.lang.AssertionError
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
        at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:120)
        at 
org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:251)
        at 
org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:203)
        at 
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:151)
ERROR [MutationStage:156] 2014-06-03 14:37:26,301 CassandraDaemon.java (line 
196) Exception in thread Thread[MutationStage:156,5,main]
java.lang.AssertionError
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
        at 
org.apache.cassandra.utils.FBUtilities.serialize(FBUtilities.java:654)
        at 
org.apache.cassandra.db.HintedHandOffManager.hintFor(HintedHandOffManager.java:137)
        at 
org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:897)
        at 
org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:870)
        at 
org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1961)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}

The problem is that the row tombstone has microsecond timestamps but the 
counter has milliseconds (I just created CASSANDRA-7346 for this). When the 
local counter mutation is applied, it is immediately removed because the row 
tombstone has a later timestamp. When we read it back, once the row tombstone 
becomes GCable (that's why I set gc_grace to 0) we get null for the row, so we 
don't add a mutation to the RowMutation object (in 
CounterMutation.makeReplicaionMutation()). When this mutation gets sent it 
kills the connection thread, times out so we write a hint, which causes the 
second assertion failure. We now have to wait until the row tombstone becomes 
GCed to be able to inc anything in this row again.

Even if there wasn't CASSANDRA-7346, it is legitimate to insert a row tombstone 
with a later timestamp so this could still happen. We need to detect this case 
and handle it so the empty counter increment doesn't cause trouble. We can just 
say it's done because the counter is deleted and not bother to replicate.

I'll keep looking in case there are other code paths that trigger this.

> CassandraDaemon RowMutation exception
> -------------------------------------
>
>                 Key: CASSANDRA-7144
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7144
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Ubuntu 12.04 w/ Oracle JVM, 5 nodes cluster. Nodes 2GB / 
> 2 Cores in DigitalOcean.
>            Reporter: Maxime Lamothe-Brassard
>            Assignee: Tyler Hobbs
>
> First time reporting a bug here, apologies if I'm not posting it in the right 
> space.
> At what seems like random interval, on random nodes in random situations I 
> will get the following exception. After this the hinted handoff start timing 
> out and the node stops participating in the cluster.
> I started seeing these after switching to the Cassandra Python-Driver from 
> the Python-CQL driver.
> {noformat}
> ERROR [WRITE-/10.128.180.108] 2014-05-03 13:45:12,843 CassandraDaemon.java 
> (line 198) Exception in thread Thread[WRITE-/10.128.180.108,5,main]
> java.lang.AssertionError
>       at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
>       at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
>       at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:120)
>       at 
> org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:251)
>       at 
> org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:203)
>       at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:151)
> ERROR [WRITE-/10.128.194.70] 2014-05-03 13:45:12,843 CassandraDaemon.java 
> (line 198) Exception in thread Thread[WRITE-/10.128.194.70,5,main]
> java.lang.AssertionError
>       at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
>       at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
>       at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:120)
>       at 
> org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:251)
>       at 
> org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:203)
>       at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:151)
> ERROR [MutationStage:118] 2014-05-03 13:45:15,048 CassandraDaemon.java (line 
> 198) Exception in thread Thread[MutationStage:118,5,main]
> java.lang.AssertionError
>       at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
>       at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
>       at 
> org.apache.cassandra.utils.FBUtilities.serialize(FBUtilities.java:654)
>       at 
> org.apache.cassandra.db.HintedHandOffManager.hintFor(HintedHandOffManager.java:137)
>       at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:908)
>       at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:881)
>       at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1981)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:744)
> ERROR [MutationStage:117] 2014-05-03 13:45:15,048 CassandraDaemon.java (line 
> 198) Exception in thread Thread[MutationStage:117,5,main]
> java.lang.AssertionError
>       at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
>       at 
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
>       at 
> org.apache.cassandra.utils.FBUtilities.serialize(FBUtilities.java:654)
>       at 
> org.apache.cassandra.db.HintedHandOffManager.hintFor(HintedHandOffManager.java:137)
>       at 
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:908)
>       at 
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:881)
>       at 
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1981)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:744)
> {noformat}
> The service must be restarted for the node to come back online. Let me know 
> any additional configuration details needed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to