[
https://issues.apache.org/jira/browse/CASSANDRA-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196125#comment-13196125
]
Sylvain Lebresne commented on CASSANDRA-3804:
---------------------------------------------
This is not counter related (but you have to use CL.ALL to reproduce without
counters as otherwise it's hidden by the fact that only the non-upgraded
coordinator acknowledges writes) and it is related to CASSANDRA-1391.
This is due to the inability of doing schema changes in a mixed pre/post-1.1
cluster, if I trust the following log (from the upgraded node):
{noformat}
java.lang.RuntimeException: java.io.IOException: Can't accept schema migrations
from Cassandra versions previous to 1.1, please update first.
at
org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:544)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Can't accept schema migrations from Cassandra
versions previous to 1.1, please update first.
at
org.apache.cassandra.service.MigrationManager.deserializeMigrationMessage(MigrationManager.java:233)
at
org.apache.cassandra.db.DefsTable.mergeRemoteSchema(DefsTable.java:231)
at
org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:48)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
{noformat}
There is however two problems imho:
# Not supporting migrations during the upgrade process is one thing, but it
should put the cluster in a broken state, which I'm not sure it doesn't do.
Ideally, new nodes would still accept old migrations from old nodes, but would
refuse to schema changes themselves until they know all nodes are upgraded. We
could then throw an UnavailableException with a message.
# On top of the exception above, the logs during that test are filled with
errors that don't sound too reassuring. On every node (upgraded or not), there
is a handful of:
{noformat}
ERROR [MutationStage:34] 2012-01-30 14:35:39,041 AbstractCassandraDaemon.java
(line 134) Fatal exception in thread Thread[MutationStage:34,5,main]
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:66)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
at java.io.DataInputStream.readUTF(DataInputStream.java:572)
at java.io.DataInputStream.readUTF(DataInputStream.java:547)
at
org.apache.cassandra.db.TruncationSerializer.deserialize(Truncation.java:80)
at
org.apache.cassandra.db.TruncationSerializer.deserialize(Truncation.java:70)
at
org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:44)
... 4 more
{noformat}
On the upgraded node, there is a few:
{noformat}
ERROR [MutationStage:38] 2012-01-30 14:35:50,772 RowMutationVerbHandler.java
(line 61) Error in row mutation
org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find
cfId=1000
at
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:129)
at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401)
at
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409)
at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357)
at
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}
And on the non-upgraded ones, there is a few:
{noformat}
ERROR [GossipStage:1] 2012-01-30 14:35:13,363 AbstractCassandraDaemon.java
(line 139) Fatal exception in thread Thread[GossipStage:1,5,main]
java.lang.UnsupportedOperationException: Not a time-based UUID
at java.util.UUID.timestamp(UUID.java:308)
at
org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121)
at
org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99)
at
org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83)
at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806)
at
org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:849)
at
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:908)
at
org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}
> upgrade problems from 1.0 to trunk
> ----------------------------------
>
> Key: CASSANDRA-3804
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3804
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.1
> Environment: ubuntu, cluster set up with ccm.
> Reporter: Tyler Patterson
> Assignee: Sylvain Lebresne
> Fix For: 1.1
>
>
> A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only
> one node is taken down, upgraded to trunk, and started again. An rpc timeout
> exception happens if counter-add operations are done. It usually takes
> between 1 and 500 add operations before the failure occurs. The failure seems
> to happen sooner if the coordinator node is NOT the one that was upgraded.
> Here is the error:
> {code}
> ======================================================================
> ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest
> self.test(*self.arg)
> File "/home/tahooie/cassandra-dtest/counter_upgrade_test.py", line 50, in
> counter_upgrade_test
> cursor.execute("UPDATE counters SET row = row+1 where key='a'")
> File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in
> execute
> raise cql.OperationalError("Request did not complete within rpc_timeout.")
> OperationalError: Request did not complete within rpc_timeout.
> {code}
> A script has been added to cassandra-dtest (counter_upgrade_test.py) to
> demonstrate the failure. The newest version of CCM is required to run the
> test. It is available here if it hasn't yet been pulled:
> [email protected]:tpatterson/ccm.git
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira