[
https://issues.apache.org/jira/browse/CASSANDRA-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Samuel Klock updated CASSANDRA-13109:
-------------------------------------
Attachment: 13109-3.0.txt
Attaching the patch.
> Lightweight transactions temporarily fail after upgrade from 2.1 to 3.0
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-13109
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13109
> Project: Cassandra
> Issue Type: Bug
> Reporter: Samuel Klock
> Attachments: 13109-3.0.txt
>
>
> We've observed this upgrading from 2.1.15 to 3.0.8 and from 2.1.16 to 3.0.10:
> some lightweight transactions executed on upgraded nodes fail with a read
> failure. The following conditions seem relevant to this occurring:
> * The transaction must be conditioned on the current value of at least one
> column, e.g., {{IF NOT EXISTS}} transactions don't seem to be affected.
> * There should be a collection column (in our case, a map) defined on the
> table on which the transaction is executed.
> * The transaction should be executed before sstables on the node are
> upgraded. The failure does not occur after the sstables have been upgraded
> (whether via {{nodetool upgradesstables}} or effectively via compaction).
> * Upgraded nodes seem to be able to participate in lightweight transactions
> as long as they're not the coordinator.
> * The values in the row being manipulated by the transaction must have been
> consistently manipulated by lightweight transactions (perhaps the existence
> of Paxos state for the partition is somehow relevant?).
> * In 3.0.10, it _seems_ to be necessary to have the partition split across
> multiple legacy sstables. This was not necessary to reproduce the bug in
> 3.0.8 or .9.
> For applications affected by this bug, a possible workaround is to prevent
> nodes being upgraded from coordinating requests until sstables have been
> upgraded.
> We're able to reproduce this when upgrading from 2.1.16 to 3.0.10 with the
> following steps on a single-node cluster using a mostly pristine
> {{cassandra.yaml}} from the source distribution.
> # Start Cassandra-2.1.16 on the node.
> # Create a table with a collection column and insert some data into it.
> {code:sql}
> CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy',
> 'replication_factor': 1};
> CREATE TABLE test.test (key TEXT PRIMARY KEY, cas_target TEXT,
> some_collection MAP<TEXT, TEXT>);
> INSERT INTO test.test (key, cas_target, some_collection) VALUES ('key',
> 'value', {}) IF NOT EXISTS;
> {code}
> # Flush the row to an sstable: {{nodetool flush}}.
> # Update the row:
> {code:sql}
> UPDATE test.test SET cas_target = 'newvalue', some_collection = {} WHERE key
> = 'key' IF cas_target = 'value';
> {code}
> # Drain the node: {{nodetool drain}}
> # Stop the node, upgrade to 3.0.10, and start the node.
> # Attempt to update the row again:
> {code:sql}
> UPDATE test.test SET cas_target = 'lastvalue' WHERE key = 'key' IF cas_target
> = 'newvalue';
> {code}
> Using {{cqlsh}}, if the error is reproduced, the following output will be
> returned:
> {code:sql}
> $ ./cqlsh <<< "UPDATE test.test SET cas_target = 'newvalue', some_collection
> = {} WHERE key = 'key' IF cas_target = 'value';"
> (start: 2016-12-22 10:14:27 EST)
> <stdin>:2:ReadFailure: Error from server: code=1300 [Replica(s) failed to
> execute read] message="Operation failed - received 0 responses and 1
> failures" info={'failures': 1, 'received_responses': 0, 'required_responses':
> 1, 'consistency': 'QUORUM'}
> {code}
> and the following stack trace will be present in the system log:
> {noformat}
> WARN 15:14:28 Uncaught exception on thread
> Thread[SharedPool-Worker-10,10,main]: {}
> java.lang.RuntimeException: java.lang.NullPointerException
> at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2476)
> ~[main/:na]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_101]
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
> ~[main/:na]
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
> [main/:na]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [main/:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> Caused by: java.lang.NullPointerException: null
> at
> org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:617)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:569)
> ~[main/:na]
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:220)
> ~[main/:na]
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:159)
> ~[main/:na]
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> ~[main/:na]
> at org.apache.cassandra.db.rows.Row$Merger.merge(Row.java:546)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:563)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:527)
> ~[main/:na]
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:220)
> ~[main/:na]
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:159)
> ~[main/:na]
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:509)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:369)
> ~[main/:na]
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> ~[main/:na]
> at
> org.apache.cassandra.db.partitions.AbstractBTreePartition.build(AbstractBTreePartition.java:334)
> ~[main/:na]
> at
> org.apache.cassandra.db.partitions.ImmutableBTreePartition.create(ImmutableBTreePartition.java:111)
> ~[main/:na]
> at
> org.apache.cassandra.db.partitions.ImmutableBTreePartition.create(ImmutableBTreePartition.java:94)
> ~[main/:na]
> at
> org.apache.cassandra.db.SinglePartitionReadCommand.add(SinglePartitionReadCommand.java:810)
> ~[main/:na]
> at
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndSSTablesInTimestampOrder(SinglePartitionReadCommand.java:760)
> ~[main/:na]
> at
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:519)
> ~[main/:na]
> at
> org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:496)
> ~[main/:na]
> at
> org.apache.cassandra.db.SinglePartitionReadCommand.queryStorage(SinglePartitionReadCommand.java:358)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:394)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1794)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2472)
> ~[main/:na]
> ... 5 common frames omitted
> {noformat}
> Under both 3.0.8 and .9, the {{nodetool flush}} and additional {{UPDATE}}
> statement before upgrading to 3.0 are not necessary to reproduce this. In
> that case (when Cassandra only has to read the data from one sstable?), a
> different stack trace appears in the log. Here's a sample from 3.0.8:
> {noformat}
> WARN [SharedPool-Worker-3] 2016-12-13 15:19:48,863
> AbstractLocalAwareExecutorService.java (line 169) Uncaught exception on
> thread Thread[SharedPool-Worker-3,5,main]: {}
> java.lang.RuntimeException: java.lang.IllegalStateException:
> [ColumnDefinition{name=REDACTED,
> type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1},
> ColumnDefinition{name=REDACTED2,
> type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
> kind=REGULAR, position=-1}] is not a subset of [REDACTED]
> at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2453)
> ~[main/:na]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_101]
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
> ~[main/:na]
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
> [main/:na]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [main/:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> Caused by: java.lang.IllegalStateException: [ColumnDefinition{name=REDACTED,
> type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1},
> ColumnDefinition{name=REDACTED2,
> type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
> kind=REGULAR, position=-1}] is not a subset of [REDACTED]
> at
> org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:531)
> ~[main/:na]
> at
> org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:465)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:178)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
> ~[main/:na]
> at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:300)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:127)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1796)
> ~[main/:na]
> at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2449)
> ~[main/:na]
> ... 5 common frames omitted
> WARN [SharedPool-Worker-1] 2016-12-13 15:19:48,943
> AbstractLocalAwareExecutorService.java (line 169) Uncaught exception on
> thread Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.IllegalStateException: [ColumnDefinition{name=REDACTED,
> type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1},
> ColumnDefinition{name=REDACTED2,
> type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
> kind=REGULAR, position=-1}] is not a subset of [REDACTED]
> at
> org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:531)
> ~[main/:na]
> at
> org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:465)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:178)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
> ~[main/:na]
> at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:300)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:127)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289)
> ~[main/:na]
> at
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
> ~[main/:na]
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> ~[main/:na]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_101]
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
> ~[main/:na]
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
> [main/:na]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> [main/:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> {noformat}
> It's not clear to us what changed in 3.0.10 to make this behavior somewhat
> more difficult to reproduce.
> We spent some time trying to track down the cause in 3.0.8, and we've
> identified a very small patch (which I will attach to this issue) that
> _seems_ to fix it. The problem appears to be that the logic that reads data
> from legacy sstables can pull range tombstones covering collection columns
> that weren't requested, which then breaks downstream logic that doesn't
> expect those tombstones to be present in the data. The patch attempts to
> include those tombstones only if they're explicitly requested. However,
> there's enough going on in that logic that it's not clear to us whether the
> change is safe, so it is definitely in need of review from someone
> knowledgable about what that area of the code is intended to do.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)