[
https://issues.apache.org/jira/browse/CASSANDRA-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14016525#comment-14016525
]
Richard Low commented on CASSANDRA-7144:
----------------------------------------
I have a reproduction. Not sure if it was the cause of our issue but it repros
an identical situation.
I have a 3 node cluster running on my box. I ran these cql commands:
{code}
cqlsh> create keyspace ks with replication = {'class':'SimpleStrategy',
'replication_factor':2};
cqlsh> use ks;
cqlsh:ks> create table counter_test (a text primary key, b counter) with
gc_grace_seconds = 0;
cqlsh:ks> consistency all;
Consistency level set to ALL.
cqlsh:ks> update counter_test set b = b + 1 where a = 'a';
cqlsh:ks> delete from counter_test where a = 'a';
cqlsh:ks> update counter_test set b = b + 1 where a = 'a';
Request did not complete within rpc_timeout.
{code}
And in the logs:
{code}
ERROR [WRITE-/127.0.0.2] 2014-06-03 14:37:24,186 CassandraDaemon.java (line
196) Exception in thread Thread[WRITE-/127.0.0.2,5,main]
java.lang.AssertionError
at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:120)
at
org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:251)
at
org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:203)
at
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:151)
ERROR [MutationStage:156] 2014-06-03 14:37:26,301 CassandraDaemon.java (line
196) Exception in thread Thread[MutationStage:156,5,main]
java.lang.AssertionError
at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
at
org.apache.cassandra.utils.FBUtilities.serialize(FBUtilities.java:654)
at
org.apache.cassandra.db.HintedHandOffManager.hintFor(HintedHandOffManager.java:137)
at
org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:897)
at
org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:870)
at
org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1961)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}
The problem is that the row tombstone has microsecond timestamps but the
counter has milliseconds (I just created CASSANDRA-7346 for this). When the
local counter mutation is applied, it is immediately removed because the row
tombstone has a later timestamp. When we read it back, once the row tombstone
becomes GCable (that's why I set gc_grace to 0) we get null for the row, so we
don't add a mutation to the RowMutation object (in
CounterMutation.makeReplicaionMutation()). When this mutation gets sent it
kills the connection thread, times out so we write a hint, which causes the
second assertion failure. We now have to wait until the row tombstone becomes
GCed to be able to inc anything in this row again.
Even if there wasn't CASSANDRA-7346, it is legitimate to insert a row tombstone
with a later timestamp so this could still happen. We need to detect this case
and handle it so the empty counter increment doesn't cause trouble. We can just
say it's done because the counter is deleted and not bother to replicate.
I'll keep looking in case there are other code paths that trigger this.
> CassandraDaemon RowMutation exception
> -------------------------------------
>
> Key: CASSANDRA-7144
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7144
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Ubuntu 12.04 w/ Oracle JVM, 5 nodes cluster. Nodes 2GB /
> 2 Cores in DigitalOcean.
> Reporter: Maxime Lamothe-Brassard
> Assignee: Tyler Hobbs
>
> First time reporting a bug here, apologies if I'm not posting it in the right
> space.
> At what seems like random interval, on random nodes in random situations I
> will get the following exception. After this the hinted handoff start timing
> out and the node stops participating in the cluster.
> I started seeing these after switching to the Cassandra Python-Driver from
> the Python-CQL driver.
> {noformat}
> ERROR [WRITE-/10.128.180.108] 2014-05-03 13:45:12,843 CassandraDaemon.java
> (line 198) Exception in thread Thread[WRITE-/10.128.180.108,5,main]
> java.lang.AssertionError
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
> at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:120)
> at
> org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:251)
> at
> org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:203)
> at
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:151)
> ERROR [WRITE-/10.128.194.70] 2014-05-03 13:45:12,843 CassandraDaemon.java
> (line 198) Exception in thread Thread[WRITE-/10.128.194.70,5,main]
> java.lang.AssertionError
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
> at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:120)
> at
> org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:251)
> at
> org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:203)
> at
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:151)
> ERROR [MutationStage:118] 2014-05-03 13:45:15,048 CassandraDaemon.java (line
> 198) Exception in thread Thread[MutationStage:118,5,main]
> java.lang.AssertionError
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
> at
> org.apache.cassandra.utils.FBUtilities.serialize(FBUtilities.java:654)
> at
> org.apache.cassandra.db.HintedHandOffManager.hintFor(HintedHandOffManager.java:137)
> at
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:908)
> at
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:881)
> at
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1981)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> ERROR [MutationStage:117] 2014-05-03 13:45:15,048 CassandraDaemon.java (line
> 198) Exception in thread Thread[MutationStage:117,5,main]
> java.lang.AssertionError
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:271)
> at
> org.apache.cassandra.db.RowMutation$RowMutationSerializer.serialize(RowMutation.java:259)
> at
> org.apache.cassandra.utils.FBUtilities.serialize(FBUtilities.java:654)
> at
> org.apache.cassandra.db.HintedHandOffManager.hintFor(HintedHandOffManager.java:137)
> at
> org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:908)
> at
> org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:881)
> at
> org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1981)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> The service must be restarted for the node to come back online. Let me know
> any additional configuration details needed.
--
This message was sent by Atlassian JIRA
(v6.2#6252)