[
https://issues.apache.org/jira/browse/CASSANDRA-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Tunnicliffe updated CASSANDRA-12098:
----------------------------------------
Reviewer: Aleksey Yeschenko
Fix Version/s: 3.0.x
Status: Patch Available (was: In Progress)
What appears to have happened here is another race during schema update. From
{{node1_debug.log}} we can see that the schema update to add {{b_index}} is
being processed directly before the attempted deserialization of the mutation
which causes the test failure ({{2016-06-26 08:11:32,179}}). This involves a
call to {{apply(CFMetadata)}} on the existing {{CFMetaData}} for the table
obtained from {{Schema.instance}}. The problem lies in
{{CFMetaData::rebuild()}} called during {{apply}}, as therein exists a window
where the {{columnMetadata}} map does not contain all of the {{name ->
ColumnDefinition}} mappings for the table.
It appears that the deserialization of the mutation message occurs during this
window. The {{cfId}} is first read from the wire, then used to fetch the
{{CFMetaData}} from {{Schema.instance}} (this will be the same instance as is
being mutated by the migration thread). Down the line, this is then used for
lookups in {{Columns.Serializer}} when deserializing the
{{SerializationHeader}}. This lookup seems to have occurred precisely between
{{columnMetadata}} being cleared and repopulated, resulting in the unknown
column error.
The real solution is CASSANDRA-9425, but in the meantime I believe it is safe
to make {{columnMetadata}} non-final & volatile and swap in a fresh map during
rebuild. The only places that the map is actually mutated is in schema
alterting statements where the new {{CFMetaData}} is constructed .
I've pushed branches with that change & CI is pending:
||branch||testall||dtest||
|[12098-3.0|https://github.com/beobal/cassandra/tree/12098-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-3.0-dtest]|
|[12098-3.9|https://github.com/beobal/cassandra/tree/12098-3.9]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-3.9-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-3.9-dtest]|
|[12098-trunk|https://github.com/beobal/cassandra/tree/12098-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-trunk-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-trunk-dtest]|
> dtest failure in
> secondary_indexes_test.TestSecondaryIndexes.test_only_coordinator_chooses_index_for_query
> ----------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-12098
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12098
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sean McCarthy
> Assignee: Sam Tunnicliffe
> Labels: dtest
> Fix For: 3.0.x, 3.x
>
> Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log,
> node2_debug.log, node2_gc.log, node3.log, node3_debug.log, node3_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_offheap_dtest/273/testReport/secondary_indexes_test/TestSecondaryIndexes/test_only_coordinator_chooses_index_for_query
> Failed on CassCI build trunk_offheap_dtest #273
> {code}
> Standard Output
> Unexpected error in node1 log, error:
> ERROR [MessagingService-Incoming-/127.0.0.3] 2016-06-26 08:11:32,185
> CassandraDaemon.java:219 - Exception in thread
> Thread[MessagingService-Incoming-/127.0.0.3,5,main]
> java.lang.RuntimeException: Unknown column b during deserialization
> at
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433)
> ~[main/:na]
> at
> org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:407)
> ~[main/:na]
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:192)
> ~[main/:na]
> at
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:668)
> ~[main/:na]
> at
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:642)
> ~[main/:na]
> at
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:349)
> ~[main/:na]
> at
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:368)
> ~[main/:na]
> at
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:305)
> ~[main/:na]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:114)
> ~[main/:na]
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:190)
> ~[main/:na]
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
> ~[main/:na]
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
> ~[main/:na]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)