Alexander Yasnogor created CASSANDRA-12857:
----------------------------------------------

             Summary: Upgrade procedure between 2.1.x and 3.0.x is broken
                 Key: CASSANDRA-12857
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12857
             Project: Cassandra
          Issue Type: Bug
            Reporter: Alexander Yasnogor
            Priority: Critical


It is not possible safely to do Cassandra in place upgrade from 2.1.14 to 3.0.9.

Distribution: deb packages from datastax community repo.

The upgrade was performed according to procedure from this docu: 
https://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgrdCassandraDetails.html

Potential reason: The upgrade procedure creates corrupted system_schema and 
this keyspace get populated in the cluster and kills it.

We started with one datacenter which contains 19 nodes divided to two racks.
First rack was successfully upgraded and nodetool describecluster reported two 
schema versions. One for upgraded nodes, another for non-upgraded nodes.

On starting new version on a first node from the second rack:
{code:java}
INFO  [main] 2016-10-25 13:06:12,103 LegacySchemaMigrator.java:87 - Moving 11 
keyspaces from legacy schema tables to the new schema keyspace (system_schema)
INFO  [main] 2016-10-25 13:06:12,104 LegacySchemaMigrator.java:148 - Migrating 
keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@7505e6ac
INFO  [main] 2016-10-25 13:06:12,200 LegacySchemaMigrator.java:148 - Migrating 
keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@64414574
INFO  [main] 2016-10-25 13:06:12,204 LegacySchemaMigrator.java:148 - Migrating 
keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@3f2c5f45
INFO  [main] 2016-10-25 13:06:12,207 LegacySchemaMigrator.java:148 - Migrating 
keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@2bc2d64d
INFO  [main] 2016-10-25 13:06:12,301 LegacySchemaMigrator.java:148 - Migrating 
keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@77343846
INFO  [main] 2016-10-25 13:06:12,305 LegacySchemaMigrator.java:148 - Migrating 
keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@19b0b931
INFO  [main] 2016-10-25 13:06:12,308 LegacySchemaMigrator.java:148 - Migrating 
keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@44bb0b35
INFO  [main] 2016-10-25 13:06:12,311 LegacySchemaMigrator.java:148 - Migrating 
keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@79f6cd51
INFO  [main] 2016-10-25 13:06:12,319 LegacySchemaMigrator.java:148 - Migrating 
keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@2fcd363b
INFO  [main] 2016-10-25 13:06:12,356 LegacySchemaMigrator.java:148 - Migrating 
keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@609eead6
INFO  [main] 2016-10-25 13:06:12,358 LegacySchemaMigrator.java:148 - Migrating 
keyspace org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@7eb7f5d0
INFO  [main] 2016-10-25 13:06:13,958 LegacySchemaMigrator.java:97 - Truncating 
legacy schema tables
INFO  [main] 2016-10-25 13:06:26,474 LegacySchemaMigrator.java:103 - Completed 
migration of legacy schema tables
INFO  [main] 2016-10-25 13:06:26,474 StorageService.java:521 - Populating token 
metadata from system tables
INFO  [main] 2016-10-25 13:06:26,796 StorageService.java:528 - Token metadata: 
Normal Tokens: [HUGE LIST of tokens]
INFO  [main] 2016-10-25 13:06:29,066 ColumnFamilyStore.java:389 - Initializing 
...
INFO  [main] 2016-10-25 13:06:29,066 ColumnFamilyStore.java:389 - Initializing 
...
INFO  [main] 2016-10-25 13:06:45,894 AutoSavingCache.java:165 - Completed 
loading (2 ms; 460 keys) KeyCache cache
INFO  [main] 2016-10-25 13:06:46,982 StorageService.java:521 - Populating token 
metadata from system tables
INFO  [main] 2016-10-25 13:06:47,394 StorageService.java:528 - Token metadata: 
Normal Tokens:[HUGE LIST of tokens]
INFO  [main] 2016-10-25 13:06:47,420 LegacyHintsMigrator.java:88 - Migrating 
legacy hints to new storage
INFO  [main] 2016-10-25 13:06:47,420 LegacyHintsMigrator.java:91 - Forcing a 
major compaction of system.hints table
INFO  [main] 2016-10-25 13:06:50,587 LegacyHintsMigrator.java:95 - Writing 
legacy hints to the new storage
INFO  [main] 2016-10-25 13:06:53,927 LegacyHintsMigrator.java:99 - Truncating 
system.hints table
....
INFO  [main] 2016-10-25 13:06:56,572 MigrationManager.java:342 - Create new 
table: 
org.apache.cassandra.config.CFMetaData@242e5306[cfId=c5e99f16-8677-3914-b17e-960613512345,ksName=system_traces,cfName=sessions,flags=[COMPOUND],params=TableParams{comment=tracing
 sessions, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, 
bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, 
default_time_to_live=0, memtable_flush_period_in_ms=3600000, 
min_index_interval=128, max_index_interval=2048, 
speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 
'NONE'}, 
compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
 options={min_threshold=4, max_threshold=32}}, 
compression=org.apache.cassandra.schema.CompressionParams@3fa913a4, 
extensions={}},comparator=comparator(),partitionColumns=[[] | [client command 
coordinator duration request started_at 
parameters]],partitionKeyColumns=[ColumnDefinition{name=session_id, 
type=org.apache.cassandra.db.marshal.UUIDType, kind=PARTITION_KEY, 
position=0}],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UUIDType,columnMetadata=[ColumnDefinition{name=client,
 type=org.apache.cassandra.db.marshal.InetAddressType, kind=REGULAR, 
position=-1}, ColumnDefinition{name=command, 
type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1}, 
ColumnDefinition{name=session_id, 
type=org.apache.cassandra.db.marshal.UUIDType, kind=PARTITION_KEY, position=0}, 
ColumnDefinition{name=coordinator, 
type=org.apache.cassandra.db.marshal.InetAddressType, kind=REGULAR, 
position=-1}, ColumnDefinition{name=request, 
type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1}, 
ColumnDefinition{name=started_at, 
type=org.apache.cassandra.db.marshal.TimestampType, kind=REGULAR, position=-1}, 
ColumnDefinition{name=duration, type=org.apache.cassandra.db.marshal.Int32Type, 
kind=REGULAR, position=-1}, ColumnDefinition{name=parameters, 
type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
 kind=REGULAR, position=-1}],droppedColumns={},triggers=[],indexes=[]]
INFO  [GossipStage:1] 2016-10-25 13:06:57,121 StorageService.java:1969 - Node 
/10.41.100.31 state jump to NORMAL
INFO  [GossipStage:1] 2016-10-25 13:06:57,127 TokenMetadata.java:479 - Updating 
topology for /10.41.100.31
INFO  [GossipStage:1] 2016-10-25 13:06:57,127 TokenMetadata.java:479 - Updating 
topology for /10.41.100.31
INFO  [HANDSHAKE-/10.11.100.19] 2016-10-25 13:06:57,128 
OutboundTcpConnection.java:515 - Handshaking version with /10.11.100.19
.....
INFO  [main] 2016-10-25 13:07:02,773 MigrationManager.java:342 - Create new 
table: ……………
INFO  [main] 2016-10-25 13:07:04,136 MigrationManager.java:302 - Create new 
Keyspace: KeyspaceMetadata
{code}

But then all upgraded nodes reported many times the same error
{code:java}
ERROR [InternalResponseStage:12] 2016-10-25 13:07:26,891 MigrationTask.java:96 
- Configuration exception merging remote schema 
org.apache.cassandra.exceptions.ConfigurationException: Column family 
comparators do not match or are not compatible (found 
comparator(org.apache.cassandra.db.marshal.UTF8Type, org.apac......
        at 
org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:787)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
        at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:740) 
~[apache-cassandra-3.0.9.jar:3.0.9]
        at org.apache.cassandra.config.Schema.updateTable(Schema.java:661) 
~[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1346)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1302)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1252)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:92) 
~[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
 [apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
[apache-cassandra-3.0.9.jar:3.0.9]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_101]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_101]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_101]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_101]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
{code}

nodetool describecluster reported 4 different schema versions
        1. All nodes on old version
        2. 7 migrated nodes from the first rack
        3. 2 migrated nodes from the first rack
        4. 1 node from the second rack
        
Meanwhile the cluster was fully responsible for reads and writes.
Anyway the migration was stopped at this point and further investigations 
showed that there are corrupted records in system_schema.tables, 
system_schema.columns contained duplicated broken records with \x00 instead of 
letters.

{code:java}
dc1_tenant_ssd |                         \x00\x00\x00\x00\x00\x00 |             
     0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                UP 
(on SSD) | {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
{'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |          
                0 |                    0 |           {} | {'compound'} |        
   864000 | 0ae08450-80b9-11e6-8bf1-0df6cc57511a |               2048 |         
                  0 |                128 |                  0 |      
99PERCENTILE
dc1_tenant_ssd | \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 |             
     0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} | UP (old CF format, 
on SSD) | {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
{'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |          
                0 |                    0 |           {} |    {'dense'} |        
   864000 | 16c420b0-78fd-11e6-ae98-ff8f609f3a2d |               2048 |         
                  0 |                128 |                  0 |      
99PERCENTILE
dc1_tenant_ssd |                         \x00\x00\x00\x00\x00\x00 |             
     0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                UT 
(on SSD) | {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
{'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |          
                0 |                    0 |           {} |    {'dense'} |        
   864000 | c38bce70-78fc-11e6-ae98-ff8f609f3a2d |               2048 |         
                  0 |                128 |                  0 |      
99PERCENTILE
dc1_tenant_ssd |                                           user_p |             
     0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                UP 
(on SSD) | {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
{'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |          
                0 |                    0 |           {} | {'compound'} |        
   864000 | 0ae08450-80b9-11e6-8bf1-0df6cc57511a |               2048 |         
                  0 |                128 |                  0 |      
99PERCENTILE
dc1_tenant_ssd |                                     user_p_oldcf |             
     0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} | UP (old CF format, 
on SSD) | {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
{'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |          
                0 |                    0 |           {} |    {'dense'} |        
   864000 | 16c420b0-78fd-11e6-ae98-ff8f609f3a2d |               2048 |         
                  0 |                128 |                  0 |      
99PERCENTILE
dc1_tenant_ssd |                                           user_t |             
     0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                UT 
(on SSD) | {'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
{'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |          
                0 |                    0 |           {} |    {'dense'} |        
   864000 | c38bce70-78fc-11e6-ae98-ff8f609f3a2d |               2048 |         
                  0 |                128 |                  0 |      
99PERCENTILE
{code}

it is clear that system_schema was corrupted on every node based on sstabledump 
output.
The strange thing is that before upgrading the whole cluster, one single node 
was upgraded one day before and system_schema was OK before to roll out the 
upgrade on other nodes. It was particularly checked.

later the upgraded nodes refused to restart due to the duplicates in 
system_schema.tables with an exception:
{code:java}
java.lang.IllegalStateException: One row required, 2 found
        at 
org.apache.cassandra.cql3.UntypedResultSet$FromResultSet.one(UntypedResultSet.java:84)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:938) 
~[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:928) 
~[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:891)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:868)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:856)
 ~[apache-cassandra-3.0.9.jar:3.0.9]
        at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:136) 
~[apache-cassandra-3.0.9.jar:3.0.9]
        at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:126) 
~[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:239) 
[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:568) 
[apache-cassandra-3.0.9.jar:3.0.9]
        at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:696) 
[apache-cassandra-3.0.9.jar:3.0.9]
{code}

I am quite confident that this is not a hardware problem, so far tried to 
perform the upgrade twice with the same results.
Yes, this very unfortunate migration scenario didn't affect only one node and 
brought the cluster to unusable state where there were no ways back. So far 
decommission didn't work between different versions and scrub removed all data 
from tables in system_schema.
We ended up by exporting the data and removing upgraded nodes from the cluster 
with its data.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to