[ 
https://issues.apache.org/jira/browse/CASSANDRA-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167125#comment-16167125
 ] 

Blake Eggleston commented on CASSANDRA-12857:
---------------------------------------------

The schema provided upgrades to 3.0.15 and 3.0.9 cleanly. However, the line 
throwing the exception here was removed as part of CASSANDRA-12443, I'm 
inclined to think he was hitting some alter table related problems that have 
since been fixed.

> Upgrade procedure between 2.1.x and 3.0.x is broken
> ---------------------------------------------------
>
>                 Key: CASSANDRA-12857
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12857
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alexander Yasnogor
>            Priority: Critical
>         Attachments: cassandra.schema
>
>
> It is not possible safely to do Cassandra in place upgrade from 2.1.14 to 
> 3.0.9.
> Distribution: deb packages from datastax community repo.
> The upgrade was performed according to procedure from this docu: 
> https://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgrdCassandraDetails.html
> Potential reason: The upgrade procedure creates corrupted system_schema and 
> this keyspace get populated in the cluster and kills it.
> We started with one datacenter which contains 19 nodes divided to two racks.
> First rack was successfully upgraded and nodetool describecluster reported 
> two schema versions. One for upgraded nodes, another for non-upgraded nodes.
> On starting new version on a first node from the second rack:
> {code:java}
> INFO  [main] 2016-10-25 13:06:12,103 LegacySchemaMigrator.java:87 - Moving 11 
> keyspaces from legacy schema tables to the new schema keyspace (system_schema)
> INFO  [main] 2016-10-25 13:06:12,104 LegacySchemaMigrator.java:148 - 
> Migrating keyspace 
> org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@7505e6ac
> INFO  [main] 2016-10-25 13:06:12,200 LegacySchemaMigrator.java:148 - 
> Migrating keyspace 
> org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@64414574
> INFO  [main] 2016-10-25 13:06:12,204 LegacySchemaMigrator.java:148 - 
> Migrating keyspace 
> org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@3f2c5f45
> INFO  [main] 2016-10-25 13:06:12,207 LegacySchemaMigrator.java:148 - 
> Migrating keyspace 
> org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@2bc2d64d
> INFO  [main] 2016-10-25 13:06:12,301 LegacySchemaMigrator.java:148 - 
> Migrating keyspace 
> org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@77343846
> INFO  [main] 2016-10-25 13:06:12,305 LegacySchemaMigrator.java:148 - 
> Migrating keyspace 
> org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@19b0b931
> INFO  [main] 2016-10-25 13:06:12,308 LegacySchemaMigrator.java:148 - 
> Migrating keyspace 
> org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@44bb0b35
> INFO  [main] 2016-10-25 13:06:12,311 LegacySchemaMigrator.java:148 - 
> Migrating keyspace 
> org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@79f6cd51
> INFO  [main] 2016-10-25 13:06:12,319 LegacySchemaMigrator.java:148 - 
> Migrating keyspace 
> org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@2fcd363b
> INFO  [main] 2016-10-25 13:06:12,356 LegacySchemaMigrator.java:148 - 
> Migrating keyspace 
> org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@609eead6
> INFO  [main] 2016-10-25 13:06:12,358 LegacySchemaMigrator.java:148 - 
> Migrating keyspace 
> org.apache.cassandra.schema.LegacySchemaMigrator$Keyspace@7eb7f5d0
> INFO  [main] 2016-10-25 13:06:13,958 LegacySchemaMigrator.java:97 - 
> Truncating legacy schema tables
> INFO  [main] 2016-10-25 13:06:26,474 LegacySchemaMigrator.java:103 - 
> Completed migration of legacy schema tables
> INFO  [main] 2016-10-25 13:06:26,474 StorageService.java:521 - Populating 
> token metadata from system tables
> INFO  [main] 2016-10-25 13:06:26,796 StorageService.java:528 - Token 
> metadata: Normal Tokens: [HUGE LIST of tokens]
> INFO  [main] 2016-10-25 13:06:29,066 ColumnFamilyStore.java:389 - 
> Initializing ...
> INFO  [main] 2016-10-25 13:06:29,066 ColumnFamilyStore.java:389 - 
> Initializing ...
> INFO  [main] 2016-10-25 13:06:45,894 AutoSavingCache.java:165 - Completed 
> loading (2 ms; 460 keys) KeyCache cache
> INFO  [main] 2016-10-25 13:06:46,982 StorageService.java:521 - Populating 
> token metadata from system tables
> INFO  [main] 2016-10-25 13:06:47,394 StorageService.java:528 - Token 
> metadata: Normal Tokens:[HUGE LIST of tokens]
> INFO  [main] 2016-10-25 13:06:47,420 LegacyHintsMigrator.java:88 - Migrating 
> legacy hints to new storage
> INFO  [main] 2016-10-25 13:06:47,420 LegacyHintsMigrator.java:91 - Forcing a 
> major compaction of system.hints table
> INFO  [main] 2016-10-25 13:06:50,587 LegacyHintsMigrator.java:95 - Writing 
> legacy hints to the new storage
> INFO  [main] 2016-10-25 13:06:53,927 LegacyHintsMigrator.java:99 - Truncating 
> system.hints table
> ....
> INFO  [main] 2016-10-25 13:06:56,572 MigrationManager.java:342 - Create new 
> table: 
> org.apache.cassandra.config.CFMetaData@242e5306[cfId=c5e99f16-8677-3914-b17e-960613512345,ksName=system_traces,cfName=sessions,flags=[COMPOUND],params=TableParams{comment=tracing
>  sessions, read_repair_chance=0.0, dclocal_read_repair_chance=0.0, 
> bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=0, 
> default_time_to_live=0, memtable_flush_period_in_ms=3600000, 
> min_index_interval=128, max_index_interval=2048, 
> speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' 
> : 'NONE'}, 
> compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy,
>  options={min_threshold=4, max_threshold=32}}, 
> compression=org.apache.cassandra.schema.CompressionParams@3fa913a4, 
> extensions={}},comparator=comparator(),partitionColumns=[[] | [client command 
> coordinator duration request started_at 
> parameters]],partitionKeyColumns=[ColumnDefinition{name=session_id, 
> type=org.apache.cassandra.db.marshal.UUIDType, kind=PARTITION_KEY, 
> position=0}],clusteringColumns=[],keyValidator=org.apache.cassandra.db.marshal.UUIDType,columnMetadata=[ColumnDefinition{name=client,
>  type=org.apache.cassandra.db.marshal.InetAddressType, kind=REGULAR, 
> position=-1}, ColumnDefinition{name=command, 
> type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1}, 
> ColumnDefinition{name=session_id, 
> type=org.apache.cassandra.db.marshal.UUIDType, kind=PARTITION_KEY, 
> position=0}, ColumnDefinition{name=coordinator, 
> type=org.apache.cassandra.db.marshal.InetAddressType, kind=REGULAR, 
> position=-1}, ColumnDefinition{name=request, 
> type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1}, 
> ColumnDefinition{name=started_at, 
> type=org.apache.cassandra.db.marshal.TimestampType, kind=REGULAR, 
> position=-1}, ColumnDefinition{name=duration, 
> type=org.apache.cassandra.db.marshal.Int32Type, kind=REGULAR, position=-1}, 
> ColumnDefinition{name=parameters, 
> type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
>  kind=REGULAR, position=-1}],droppedColumns={},triggers=[],indexes=[]]
> INFO  [GossipStage:1] 2016-10-25 13:06:57,121 StorageService.java:1969 - Node 
> /10.41.100.31 state jump to NORMAL
> INFO  [GossipStage:1] 2016-10-25 13:06:57,127 TokenMetadata.java:479 - 
> Updating topology for /10.41.100.31
> INFO  [GossipStage:1] 2016-10-25 13:06:57,127 TokenMetadata.java:479 - 
> Updating topology for /10.41.100.31
> INFO  [HANDSHAKE-/10.11.100.19] 2016-10-25 13:06:57,128 
> OutboundTcpConnection.java:515 - Handshaking version with /10.11.100.19
> .....
> INFO  [main] 2016-10-25 13:07:02,773 MigrationManager.java:342 - Create new 
> table: ……………
> INFO  [main] 2016-10-25 13:07:04,136 MigrationManager.java:302 - Create new 
> Keyspace: KeyspaceMetadata
> {code}
> But then all upgraded nodes reported many times the same error
> {code:java}
> ERROR [InternalResponseStage:12] 2016-10-25 13:07:26,891 
> MigrationTask.java:96 - Configuration exception merging remote schema 
> org.apache.cassandra.exceptions.ConfigurationException: Column family 
> comparators do not match or are not compatible (found 
> comparator(org.apache.cassandra.db.marshal.UTF8Type, org.apac......
>         at 
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:787)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:740) 
> ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at org.apache.cassandra.config.Schema.updateTable(Schema.java:661) 
> ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.schema.SchemaKeyspace.updateKeyspace(SchemaKeyspace.java:1346)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1302)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1252)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:92) 
> ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
>  [apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
> [apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_101]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [na:1.8.0_101]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_101]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_101]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> {code}
> nodetool describecluster reported 4 different schema versions
>       1. All nodes on old version
>       2. 7 migrated nodes from the first rack
>       3. 2 migrated nodes from the first rack
>       4. 1 node from the second rack
>       
> Meanwhile the cluster was fully responsible for reads and writes.
> Anyway the migration was stopped at this point and further investigations 
> showed that there are corrupted records in system_schema.tables, 
> system_schema.columns contained duplicated broken records with \x00 instead 
> of letters.
> {code:java}
> dc1_tenant_ssd |                         \x00\x00\x00\x00\x00\x00 |           
>        0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                
> UP (on SSD) | {'class': 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
> {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |        
>                   0 |                    0 |           {} | {'compound'} |    
>        864000 | 0ae08450-80b9-11e6-8bf1-0df6cc57511a |               2048 |   
>                         0 |                128 |                  0 |      
> 99PERCENTILE
> dc1_tenant_ssd | \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 |           
>        0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} | UP (old CF 
> format, on SSD) | {'class': 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
> {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |        
>                   0 |                    0 |           {} |    {'dense'} |    
>        864000 | 16c420b0-78fd-11e6-ae98-ff8f609f3a2d |               2048 |   
>                         0 |                128 |                  0 |      
> 99PERCENTILE
> dc1_tenant_ssd |                         \x00\x00\x00\x00\x00\x00 |           
>        0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                
> UT (on SSD) | {'class': 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
> {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |        
>                   0 |                    0 |           {} |    {'dense'} |    
>        864000 | c38bce70-78fc-11e6-ae98-ff8f609f3a2d |               2048 |   
>                         0 |                128 |                  0 |      
> 99PERCENTILE
> dc1_tenant_ssd |                                           user_p |           
>        0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                
> UP (on SSD) | {'class': 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
> {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |        
>                   0 |                    0 |           {} | {'compound'} |    
>        864000 | 0ae08450-80b9-11e6-8bf1-0df6cc57511a |               2048 |   
>                         0 |                128 |                  0 |      
> 99PERCENTILE
> dc1_tenant_ssd |                                     user_p_oldcf |           
>        0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} | UP (old CF 
> format, on SSD) | {'class': 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
> {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |        
>                   0 |                    0 |           {} |    {'dense'} |    
>        864000 | 16c420b0-78fd-11e6-ae98-ff8f609f3a2d |               2048 |   
>                         0 |                128 |                  0 |      
> 99PERCENTILE
> dc1_tenant_ssd |                                           user_t |           
>        0.001 | {'keys': 'ALL', 'rows_per_partition': 'ALL'} |                
> UT (on SSD) | {'class': 
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} | 
> {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'} |                1 |        
>                   0 |                    0 |           {} |    {'dense'} |    
>        864000 | c38bce70-78fc-11e6-ae98-ff8f609f3a2d |               2048 |   
>                         0 |                128 |                  0 |      
> 99PERCENTILE
> {code}
> it is clear that system_schema was corrupted on every node based on 
> sstabledump output.
> The strange thing is that before upgrading the whole cluster, one single node 
> was upgraded one day before and system_schema was OK before to roll out the 
> upgrade on other nodes. It was particularly checked.
> later the upgraded nodes refused to restart due to the duplicates in 
> system_schema.tables with an exception:
> {code:java}
> java.lang.IllegalStateException: One row required, 2 found
>         at 
> org.apache.cassandra.cql3.UntypedResultSet$FromResultSet.one(UntypedResultSet.java:84)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:938)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:928)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:891)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:868)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:856)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:136) 
> ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at org.apache.cassandra.config.Schema.loadFromDisk(Schema.java:126) 
> ~[apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:239) 
> [apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:568)
>  [apache-cassandra-3.0.9.jar:3.0.9]
>         at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:696) 
> [apache-cassandra-3.0.9.jar:3.0.9]
> {code}
> I am quite confident that this is not a hardware problem, so far tried to 
> perform the upgrade twice with the same results.
> Yes, this very unfortunate migration scenario didn't affect only one node and 
> brought the cluster to unusable state where there were no ways back. So far 
> decommission didn't work between different versions and scrub removed all 
> data from tables in system_schema.
> We ended up by exporting the data and removing upgraded nodes from the 
> cluster with its data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to