[jira] [Commented] (CASSANDRA-3804) upgrade problems from 1.0 to trunk

Sylvain Lebresne (Commented) (JIRA) Mon, 30 Jan 2012 06:02:41 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196125#comment-13196125
 ]


Sylvain Lebresne commented on CASSANDRA-3804:
---------------------------------------------

This is not counter related (but you have to use CL.ALL to reproduce without 
counters as otherwise it's hidden by the fact that only the non-upgraded 
coordinator acknowledges writes) and it is related to CASSANDRA-1391.

This is due to the inability of doing schema changes in a mixed pre/post-1.1 
cluster, if I trust the following log (from the upgraded node):
{noformat}
java.lang.RuntimeException: java.io.IOException: Can't accept schema migrations 
from Cassandra versions previous to 1.1, please update first.
        at 
org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:544)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Can't accept schema migrations from Cassandra 
versions previous to 1.1, please update first.
        at 
org.apache.cassandra.service.MigrationManager.deserializeMigrationMessage(MigrationManager.java:233)
        at 
org.apache.cassandra.db.DefsTable.mergeRemoteSchema(DefsTable.java:231)
        at 
org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:48)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
{noformat}

There is however two problems imho:
# Not supporting migrations during the upgrade process is one thing, but it 
should put the cluster in a broken state, which I'm not sure it doesn't do. 
Ideally, new nodes would still accept old migrations from old nodes, but would 
refuse to schema changes themselves until they know all nodes are upgraded. We 
could then throw an UnavailableException with a message.
# On top of the exception above, the logs during that test are filled with 
errors that don't sound too reassuring. On every node (upgraded or not), there 
is a handful of:
{noformat}
ERROR [MutationStage:34] 2012-01-30 14:35:39,041 AbstractCassandraDaemon.java 
(line 134) Fatal exception in thread Thread[MutationStage:34,5,main]
java.io.IOError: java.io.EOFException
        at 
org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:66)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
        at java.io.DataInputStream.readUTF(DataInputStream.java:572)
        at java.io.DataInputStream.readUTF(DataInputStream.java:547)
        at 
org.apache.cassandra.db.TruncationSerializer.deserialize(Truncation.java:80)
        at 
org.apache.cassandra.db.TruncationSerializer.deserialize(Truncation.java:70)
        at 
org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:44)
        ... 4 more
{noformat}
On the upgraded node, there is a few:
{noformat}
ERROR [MutationStage:38] 2012-01-30 14:35:50,772 RowMutationVerbHandler.java 
(line 61) Error in row mutation
org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find 
cfId=1000
        at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:129)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401)
        at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409)
        at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357)
        at 
org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
{noformat}
And on the non-upgraded ones, there is a few:
{noformat}
ERROR [GossipStage:1] 2012-01-30 14:35:13,363 AbstractCassandraDaemon.java 
(line 139) Fatal exception in thread Thread[GossipStage:1,5,main]
java.lang.UnsupportedOperationException: Not a time-based UUID
        at java.util.UUID.timestamp(UUID.java:308)
        at 
org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121)
        at 
org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99)
        at 
org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83)
        at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806)
        at 
org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:849)
        at 
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:908)
        at 
org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
{noformat}
                
> upgrade problems from 1.0 to trunk
> ----------------------------------
>
>                 Key: CASSANDRA-3804
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3804
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1
>         Environment: ubuntu, cluster set up with ccm.
>            Reporter: Tyler Patterson
>            Assignee: Sylvain Lebresne
>             Fix For: 1.1
>
>
> A 3-node cluster is on version 0.8.9, 1.0.6, or 1.0.7 and then one and only 
> one node is taken down, upgraded to trunk, and started again. An rpc timeout 
> exception happens if counter-add operations are done. It usually takes 
> between 1 and 500 add operations before the failure occurs. The failure seems 
> to happen sooner if the coordinator node is NOT the one that was upgraded. 
> Here is the error: 
> {code}
> ======================================================================
> ERROR: counter_upgrade_test.TestCounterUpgrade.counter_upgrade_test
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/usr/lib/pymodules/python2.7/nose/case.py", line 187, in runTest
>     self.test(*self.arg)
>   File "/home/tahooie/cassandra-dtest/counter_upgrade_test.py", line 50, in 
> counter_upgrade_test
>     cursor.execute("UPDATE counters SET row = row+1 where key='a'")
>   File "/usr/local/lib/python2.7/dist-packages/cql/cursor.py", line 96, in 
> execute
>     raise cql.OperationalError("Request did not complete within rpc_timeout.")
> OperationalError: Request did not complete within rpc_timeout.
> {code}
> A script has been added to cassandra-dtest (counter_upgrade_test.py) to 
> demonstrate the failure. The newest version of CCM is required to run the 
> test. It is available here if it hasn't yet been pulled: 
> [email protected]:tpatterson/ccm.git

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3804) upgrade problems from 1.0 to trunk

Reply via email to