[
https://issues.apache.org/jira/browse/CASSANDRA-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384645#comment-15384645
]
Joshua McKenzie commented on CASSANDRA-12236:
---------------------------------------------
Two ways to approach this have come up in offline discussions. The first and
less invasive method would be to suppress sending schema information about the
cdc param status on versions >= 3.8 unless cdc_enabled:true is set in the
cassandra.yaml. Specifically, removing the addition of the cdc param
[here|https://github.com/apache/cassandra/blob/cassandra-3.8/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L514]
and instead adding it conditionally
[here|https://github.com/apache/cassandra/blob/cassandra-3.8/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L476].
The upgrade path for users that want to enable cdc would be: don't enable CDC
until your entire cluster is updated to a version >= 3.8. Then enable it and
bounce your cluster. Since schema information is hard-coded in
SchemaKeyspace.java and we don't actually care about the value in that param if
cdc is not enabled on the cluster, it seems a reasonable workaround until we
get to versioned sub-systems in Cassandra.
The second and more invasive method (at least from the perspective of the # of
versions it touches and potential side-effects) would be to allow null columns
during deserialization in
[Columns.java|https://github.com/apache/cassandra/blob/cassandra-3.8/src/java/org/apache/cassandra/db/Columns.java#L433]
if the mutation is for a schema table. This would apply to 3.0.x and 3.8+.
This would get us back to a functionality somewhat similar to
pre-CASSANDRA-8099, in that mutations for schema tables on different versions
would no longer interrupt inter-node communication during an upgrade process
via RTE.
I'm by no means an expert on schema dissemination - [~iamaleksey] /
[~slebresne]: either of you have any feedback on the above two or other, better
ideas on this front?
> RTE from new CDC column breaks in flight queries.
> -------------------------------------------------
>
> Key: CASSANDRA-12236
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12236
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jeremiah Jordan
> Priority: Blocker
> Fix For: 3.8
>
>
> This RTE is not harmless. It will cause the internode connection to break
> which will cause all in flight requests between these nodes to die/timeout.
> {noformat}
> - Due to changes in schema migration handling and the storage format
> after 3.0, you will
> see error messages such as:
> "java.lang.RuntimeException: Unknown column cdc during
> deserialization"
> in your system logs on a mixed-version cluster during upgrades. This
> error message
> is harmless and due to the 3.8 nodes having cdc added to their schema
> tables while
> the <3.8 nodes do not. This message should cease once all nodes are
> upgraded to 3.8.
> As always, refrain from schema changes during cluster upgrades.
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)