[
https://issues.apache.org/jira/browse/CASSANDRA-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532500#comment-17532500
]
Jon Meredith commented on CASSANDRA-17601:
------------------------------------------
I think we can avoid introducing any more version-dependent behavior if we
remove {{ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS}} or switch enabling it to
being an advanced option, and wait until pre-4.0.x upgrades are no longer
supported. Given how core {{ColumnFilter}} is, and how long they can be kept
around with {{{}ColumnFilterFactory.PrecomputedColumnFilter{}}}, I think any
non-deterministic version dependency on lowest upgrade version must be removed.
If the patched coordinator never builds
{{ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS}} filters
- 3.x ColumnFilter will work the same as now - equivalent to Wildcard
- 4.0.x/4.1.x ColumnFilter will deserialize as the correct
SelectionColumnFilter.
If the coordinator is running on 4.0.x/4.1.x pre-fix - the lowest upgrade
version in 4.x will be null or the local version.
* Option 1 - A patched node deserializing could chose to ignore the new
{{FETCH_ALL_STATICS}} bit and deserialize
{{ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS}} as {{ALL_COLUMNS}}
** {{org.apache.cassandra.db.rows.BTreeRow#filter}} will not be able to filter
columns and not add a transform
**
{{{}org.apache.cassandra.db.rows.UnfilteredSerializer#readSimpleColumn{}}}/{{{}readSimpleColumn{}}}
will always read the static - which should work fine as all statics will be in
the fetched list.
**
{{org.apache.cassandra.index.internal.keys.KeysSearcher#queryDataFromIndex}} -
will only use initial filter, not create an extended one.
* Option 2 - we could keep it around and just enable building it behind a
configuration entry until 3.0.x is no longer supported when we it is enabled by
default/removed.
If the coordinator is running 3.0.x/3.11.x
- A patched node can deserialize as a wildcard by detecting the messaging
protocol < VERSION_40
If the coordinator is running a 4.0 release pre-4.0-rc2, I think it will
regress and return incorrect results while the cluster is in mixed mode - it is
over a year past our release date it seems reasonable to expect users to have
upgraded or tolerate the issues during the upgrade. It will resolve on upgrade.
I've pushed up a branch
[exp-refactor-out-upgraded-from-in-column-filter|https://github.com/jonmeredith/cassandra/tree/exp-refactor-out-upgraded-from-in-column-filter]
that for option 2 that has a minimal that makes fix to make creating
{{ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS}} configurable.
I also have a small reproducer that triggers the issue, but am working on a
better test that covers more of ColumnFilters functionality. I'll post that as
soon as it's worth sharing.
> IllegalStateException with prepared queries selecting static columns in mixed
> 3.0.x/4.x clusters
> ------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-17601
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17601
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip, Consistency/Coordination
> Reporter: Jon Meredith
> Assignee: Jon Meredith
> Priority: Normal
> Fix For: 4.0.x, 4.1.x, 4.x
>
>
> Clusters that contain prepared statements that partially select static
> columns before the upgrade will fail to execute those statements coordinated
> from the 4.x nodes until the upgrade completes.
> h2. Reproduction
> Setup (before upgrade)
> {code:java}
> CREATE KEYSPACE ks1 WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor':3}
> CREATE TABLE ks1.tbl1 (pk1 int,
> ck2 int,
> s3 int static,
> s4 int static,
> c5 int,
> PRIMARY KEY (pk1, ck2));
> INSERT INTO ks1.tbl1 (pk1, ck2, s3, s4, c5) VALUES (1, 2, 3, 4, 5);
> {code}
> Prepared Statement (prepare before upgrade)
> {code:java}
> SELECT c5, s3 FROM ks1.tbl1 WHERE pk1 = ? AND ck2 = ?;
> {code}
> Exception on 3.0.x nodes (when executing prepared statement after upgrade)
> {code:java}
> java.lang.IllegalStateException: [s3, s4] is not a subset of [s3] at
> org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:566)
> at
> org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:498)
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serializeRowBody(UnfilteredSerializer.java:235)
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:209)
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:141)
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:129)
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:140)
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:95)
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:80)
> at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:308)
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:191)
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:181)
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:177)
> at
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
> at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:335)
> at
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:91)
> at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:77)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:93)
> at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:44)
> at
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:433)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> Exception on 4.0.x nodes (when executing prepared statement after upgrade)
> {code:java}
> java.lang.IllegalStateException: [ColumnDefinition{name=s3,
> type=org.apache.cassandra.db.marshal.IntType, kind=STATIC, position=-1},
> ColumnDefinition{name=s4, type=org.apache.cassandra.db.marshal.IntType,
> kind=STATIC, position=-1}] is not a subset of [s3]
> at org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:555)
> at
> org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:487)
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serializeRowBody(UnfilteredSerializer.java:216)
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:190)
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:121)
> at
> org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:109)
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:140)
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:94)
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79)
> at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:326)
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:186)
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:179)
> at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:175)
> at
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:75)
> at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:499)
> at
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:91)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.runUnsafe(AbstractLocalAwareExecutorService.java:194)
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.runUnsafe(AbstractLocalAwareExecutorService.java:137)
> at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:167)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:122) at
> java.lang.Thread.run(Thread.java:748)
> {code}
> The root cause is CASSANDRA-16686 changes ColumnFilters to build and
> deserialize based on what versions the coordinating node thinks are running
> in the cluster, and that
> knowledge is always incorrect when statements are reprepared on startup and
> may be incorrect as all nodes reach their final version.
> h2. Sequence of events:
> Prepared statements are persisted in {{system.prepared_statements}} to be
> re-prepared on future startup.
> When the 4.x node starts up after upgrade, in
> {{org.apache.cassandra.service.CassandraDaemon#setup}} it calls
> {{QueryProcessor.instance.preloadPreparedStatements}} *before* the
> {{Gossiper}} is started by a call to {{StorageService.instance.initServer()}}
> later in {{{}setup{}}}.
> As part of preparing statements, when possible a {{ColumnFilterFactory}} is
> created that returns a {{ColumnFilter}} built at the time the query is
> prepared.
> After the changes from CASSANDRA-16686, the {{ColumnFilter}} builder
> constructs different column filter variants depending on the lowest version
> reported in gossip by checking
> {{{}org.apache.cassandra.gms.Gossiper#upgradeFromVersionMemoized{}}}. If this
> runs before the Gossiper is enabled the
> {{{}SystemKeyspace.CURRENT_VERSION{}}}, causing the {{ColumnFilter}} to
> create a column filter as if the cluster were fully upgraded.
> For the query above, the ColumnFilter creates an
> ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS filter.
> The 3.0.x nodes participating do not understand the new flag and creates a
> {{ColumnFilter}} the equivalent of a {{{}WildcardColumnFilter{}}}. The 4.x
> nodes participating do understand the new flag, however the deserializer
> takes the lower than 3.4 path as other 3.0 nodes are known about and creates
> a {{{}WildcardColumFilter{}}}.
> The fetchedColumns sent by the ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS
> filter only contains the queried static columns, however the pre-3.4 sstable
> iterator returns all regular and static columns, causing an
> IllegalStateException when the serialized response is sent back.
> The ISE clears once all nodes in the cluster think they are upgraded to the
> current version and behave as the originally prepared query intended.
> h2. Related Problems
> _Non-deterministic behavior of 4.0.x/4.1.x nodes_
> If the prepared statements are cleared and/or freshly prepared when the
> cluster is in mixed 3.0/4.0 mode, the pre-built ColumnFilter will remain in
> the mixed mode version until re-prepared on a restart or cache clear/eviction.
> As upgradeFromVersionMemoized times out and is recalculated after the upgrade
> reaches a single version, individual nodes will make a local decision on
> column filter building and deserializing.
> Nodes that update upgradeFromVersionMemoized early that coordinate requests
> may cause the same ISE against nodes responding to the read command have the
> previous version still.
> _Digest Mismatches_
> If {{ALL_REGULARS_AND_QUERIED_STATICS_COLUMN}} {{ColumnFilter}} s are
> incorrectly sent to 3.0.x nodes, the list of columns included will be ignored
> and compute a different digest than one locally executed on a 4.0.x
> coordinator.
> h1. Proposed fix
> In discussion with [~ifesdjeen], he suggested that the one way to resolve
> this is the {{ALL_REGULARS_AND_QUERIED_STATICS_COLUMNS}} filter should by
> deprecated (or just removed) and no longer built, always selecting all static
> columns
> This would just leave {{WildCardColumnFilter}} and {{SelectionColumnFilter}}
> with {{ALL_COLUMNS}} or {{ONLY_QUERIED_COLUMNS}}.
> This is a potential performance regression for unusual schemas with very
> large numbers of static columns, but seems unlikely in practice.
> /cc: [~blerer]
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]