[ 
https://issues.apache.org/jira/browse/CASSANDRA-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel Klock updated CASSANDRA-13109:
-------------------------------------
    Attachment: 13109-3.0.txt

Attaching the patch.

> Lightweight transactions temporarily fail after upgrade from 2.1 to 3.0
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-13109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13109
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Samuel Klock
>         Attachments: 13109-3.0.txt
>
>
> We've observed this upgrading from 2.1.15 to 3.0.8 and from 2.1.16 to 3.0.10: 
> some lightweight transactions executed on upgraded nodes fail with a read 
> failure.  The following conditions seem relevant to this occurring:
> * The transaction must be conditioned on the current value of at least one 
> column, e.g., {{IF NOT EXISTS}} transactions don't seem to be affected.
> * There should be a collection column (in our case, a map) defined on the 
> table on which the transaction is executed.
> * The transaction should be executed before sstables on the node are 
> upgraded.  The failure does not occur after the sstables have been upgraded 
> (whether via {{nodetool upgradesstables}} or effectively via compaction).
> * Upgraded nodes seem to be able to participate in lightweight transactions 
> as long as they're not the coordinator.
> * The values in the row being manipulated by the transaction must have been 
> consistently manipulated by lightweight transactions (perhaps the existence 
> of Paxos state for the partition is somehow relevant?).
> * In 3.0.10, it _seems_ to be necessary to have the partition split across 
> multiple legacy sstables.  This was not necessary to reproduce the bug in 
> 3.0.8 or .9.
> For applications affected by this bug, a possible workaround is to prevent 
> nodes being upgraded from coordinating requests until sstables have been 
> upgraded.
> We're able to reproduce this when upgrading from 2.1.16 to 3.0.10 with the 
> following steps on a single-node cluster using a mostly pristine 
> {{cassandra.yaml}} from the source distribution.
> # Start Cassandra-2.1.16 on the node.
> # Create a table with a collection column and insert some data into it.
> {code:sql}
> CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> CREATE TABLE test.test (key TEXT PRIMARY KEY, cas_target TEXT, 
> some_collection MAP<TEXT, TEXT>);
> INSERT INTO test.test (key, cas_target, some_collection) VALUES ('key', 
> 'value', {}) IF NOT EXISTS;
> {code}
> # Flush the row to an sstable: {{nodetool flush}}.
> # Update the row:
> {code:sql}
> UPDATE test.test SET cas_target = 'newvalue', some_collection = {} WHERE key 
> = 'key' IF cas_target = 'value';
> {code}
> # Drain the node: {{nodetool drain}}
> # Stop the node, upgrade to 3.0.10, and start the node.
> # Attempt to update the row again:
> {code:sql}
> UPDATE test.test SET cas_target = 'lastvalue' WHERE key = 'key' IF cas_target 
> = 'newvalue';
> {code}
> Using {{cqlsh}}, if the error is reproduced, the following output will be 
> returned:
> {code:sql}
> $ ./cqlsh <<< "UPDATE test.test SET cas_target = 'newvalue', some_collection 
> = {} WHERE key = 'key' IF cas_target = 'value';"
> (start: 2016-12-22 10:14:27 EST)
> <stdin>:2:ReadFailure: Error from server: code=1300 [Replica(s) failed to 
> execute read] message="Operation failed - received 0 responses and 1 
> failures" info={'failures': 1, 'received_responses': 0, 'required_responses': 
> 1, 'consistency': 'QUORUM'}
> {code}
> and the following stack trace will be present in the system log:
> {noformat}
> WARN  15:14:28 Uncaught exception on thread 
> Thread[SharedPool-Worker-10,10,main]: {}
> java.lang.RuntimeException: java.lang.NullPointerException
>       at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2476)
>  ~[main/:na]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_101]
>       at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[main/:na]
>       at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [main/:na]
>       at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [main/:na]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> Caused by: java.lang.NullPointerException: null
>       at 
> org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:617)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:569)
>  ~[main/:na]
>       at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:220)
>  ~[main/:na]
>       at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:159)
>  ~[main/:na]
>       at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[main/:na]
>       at org.apache.cassandra.db.rows.Row$Merger.merge(Row.java:546) 
> ~[main/:na]
>       at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:563)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:527)
>  ~[main/:na]
>       at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:220)
>  ~[main/:na]
>       at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:159)
>  ~[main/:na]
>       at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[main/:na]
>       at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:509)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:369)
>  ~[main/:na]
>       at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[main/:na]
>       at 
> org.apache.cassandra.db.partitions.AbstractBTreePartition.build(AbstractBTreePartition.java:334)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.partitions.ImmutableBTreePartition.create(ImmutableBTreePartition.java:111)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.partitions.ImmutableBTreePartition.create(ImmutableBTreePartition.java:94)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.SinglePartitionReadCommand.add(SinglePartitionReadCommand.java:810)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndSSTablesInTimestampOrder(SinglePartitionReadCommand.java:760)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:519)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:496)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.SinglePartitionReadCommand.queryStorage(SinglePartitionReadCommand.java:358)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:394) 
> ~[main/:na]
>       at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1794)
>  ~[main/:na]
>       at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2472)
>  ~[main/:na]
>       ... 5 common frames omitted
> {noformat}
> Under both 3.0.8 and .9, the {{nodetool flush}} and additional {{UPDATE}} 
> statement before upgrading to 3.0 are not necessary to reproduce this.  In 
> that case (when Cassandra only has to read the data from one sstable?), a 
> different stack trace appears in the log.  Here's a sample from 3.0.8:
> {noformat}
>  WARN [SharedPool-Worker-3] 2016-12-13 15:19:48,863 
> AbstractLocalAwareExecutorService.java (line 169) Uncaught exception on 
> thread Thread[SharedPool-Worker-3,5,main]: {}
> java.lang.RuntimeException: java.lang.IllegalStateException: 
> [ColumnDefinition{name=REDACTED, 
> type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1}, 
> ColumnDefinition{name=REDACTED2, 
> type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
>  kind=REGULAR, position=-1}] is not a subset of [REDACTED]
>         at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2453)
>  ~[main/:na]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_101]
>         at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[main/:na]
>         at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [main/:na]
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [main/:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> Caused by: java.lang.IllegalStateException: [ColumnDefinition{name=REDACTED, 
> type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1}, 
> ColumnDefinition{name=REDACTED2, 
> type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
>  kind=REGULAR, position=-1}] is not a subset of [REDACTED]
>         at 
> org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:531) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:465) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:178)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:300)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:127)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
> ~[main/:na]
>         at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1796)
>  ~[main/:na]
>         at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2449)
>  ~[main/:na]
>         ... 5 common frames omitted
>  WARN [SharedPool-Worker-1] 2016-12-13 15:19:48,943 
> AbstractLocalAwareExecutorService.java (line 169) Uncaught exception on 
> thread Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.IllegalStateException: [ColumnDefinition{name=REDACTED, 
> type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1}, 
> ColumnDefinition{name=REDACTED2, 
> type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
>  kind=REGULAR, position=-1}] is not a subset of [REDACTED]
>         at 
> org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:531) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:465) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:178)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:300)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:127)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
>  ~[main/:na]
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
> ~[main/:na]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_101]
>         at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[main/:na]
>         at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [main/:na]
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [main/:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> {noformat}
> It's not clear to us what changed in 3.0.10 to make this behavior somewhat 
> more difficult to reproduce.
> We spent some time trying to track down the cause in 3.0.8, and we've 
> identified a very small patch (which I will attach to this issue) that 
> _seems_ to fix it.  The problem appears to be that the logic that reads data 
> from legacy sstables can pull range tombstones covering collection columns 
> that weren't requested, which then breaks downstream logic that doesn't 
> expect those tombstones to be present in the data.  The patch attempts to 
> include those tombstones only if they're explicitly requested.  However, 
> there's enough going on in that logic that it's not clear to us whether the 
> change is safe, so it is definitely in need of review from someone 
> knowledgable about what that area of the code is intended to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to