[ 
https://issues.apache.org/jira/browse/CASSANDRA-12582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516024#comment-15516024
 ] 

Sylvain Lebresne commented on CASSANDRA-12582:
----------------------------------------------

bq. If we add a field to SystemKeyspace.DroppedColumns, and we are careful to 
read it as an optional field that may be missing, given that the table has 
local replication

Don't get fooled by the keyspace replication strategy, schema tables *are* 
replicated, just by their own manual mechanism (see {{MigrationManager}} for 
the gory details). Adding a column to a schema table exposes us to the same 
problem than in  CASSANDRA-12236 (and it's follow-up, CASSANDRA-12697). And 
while for the {{cdc}} column we have been able to use the {{cdc_enabled}} to 
"solve" this problem (basically saying, "keep cdc_enabled to false until fully 
upgraded"), we don't have such trick here.

Don't get me wrong, we could probably come up with something along the line of 
CASSANDRA-12236 if we really wanted to, but there is a fair chance it won't be 
very user friendly, and it's a tad involved anyway. So let's say that it's a 
lot easier to make the change in 4.0, and that even if we think it's important 
enough to try it before that, we might still want to defer that change to a 
follow-up ticket just so we get your work-around in in the meantime.

> Removing static column results in ReadFailure due to CorruptSSTableException
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12582
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12582
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>         Environment: Cassandra 3.0.8
>            Reporter: Evan Prothro
>            Assignee: Stefania
>            Priority: Critical
>              Labels: compaction, corruption, drop, read, static
>             Fix For: 3.0.x, 3.x
>
>         Attachments: 12582.cdl, 12582_reproduce.sh
>
>
> We ran into an issue on production where reads began to fail for certain 
> queries, depending on the range within the relation for those queries. 
> Cassandra system log showed an unhandled {{CorruptSSTableException}} 
> exception.
> CQL read failure:
> {code}
> ReadFailure: code=1300 [Replica(s) failed to execute read] message="Operation 
> failed - received 0 responses and 1 failures" info={'failures': 1, 
> 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
> {code}
> Cassandra exception:
> {code}
> WARN  [SharedPool-Worker-2] 2016-08-31 12:49:27,979 
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread 
> Thread[SharedPool-Worker-2,5,main]: {}
> java.lang.RuntimeException: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2453)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_72]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.0.8.jar:3.0.8]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.8.jar:3.0.8]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
> Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:343)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:66)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:62)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:24)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:295)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:127)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1796)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2449)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   ... 5 common frames omitted
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: 
> Corrupted: 
> /usr/local/apache-cassandra-3.0.8/data/data/issue309/apples_by_tree-006748a06fa311e6a7f8ef8b642e977b/mb-1-big-Data.db
>   at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(AbstractSSTableIterator.java:130)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.columniterator.SSTableIterator.<init>(SSTableIterator.java:46)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:69)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:338)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   ... 19 common frames omitted
> Caused by: java.io.IOException: Corrupt (negative) value length encountered
>   at 
> org.apache.cassandra.db.marshal.AbstractType.readValue(AbstractType.java:399) 
> ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.BufferCell$Serializer.deserialize(BufferCell.java:302)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.readSimpleColumn(UnfilteredSerializer.java:462)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(UnfilteredSerializer.java:440)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeStaticRow(UnfilteredSerializer.java:381)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.readStaticRow(AbstractSSTableIterator.java:179)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   at 
> org.apache.cassandra.db.columniterator.AbstractSSTableIterator.<init>(AbstractSSTableIterator.java:103)
>  ~[apache-cassandra-3.0.8.jar:3.0.8]
>   ... 22 common frames omitted
> {code}
> After debugging, it appears that a previously dropped static column (weeks 
> prior) was the instigator of the issue. As a workaround we added back the 
> column, restarted all cassandra processes within the cluster, and the read 
> error and corruption exception went away.
> Attached is a script to reproduce with a simple schema.
> Also noteworthy (and shown in the script) is that when in this state, 
> compaction silently failed (exit 0) to remove the dropped static columns from 
> the "corrupted" sstable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to