[ 
https://issues.apache.org/jira/browse/CASSANDRA-12098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-12098:
----------------------------------------
         Reviewer: Aleksey Yeschenko
    Fix Version/s: 3.0.x
           Status: Patch Available  (was: In Progress)

What appears to have happened here is another race during schema update. From 
{{node1_debug.log}} we can see that the schema update to add {{b_index}} is 
being processed directly before the attempted deserialization of the mutation 
which causes the test failure ({{2016-06-26 08:11:32,179}}). This involves a 
call to {{apply(CFMetadata)}} on the existing {{CFMetaData}} for the table 
obtained from {{Schema.instance}}. The problem lies in 
{{CFMetaData::rebuild()}} called during {{apply}}, as therein exists a window 
where the {{columnMetadata}} map does not contain all of the {{name -> 
ColumnDefinition}} mappings for the table. 

It appears that the deserialization of the mutation message occurs during this 
window. The {{cfId}} is first read from the wire, then used to fetch the 
{{CFMetaData}} from {{Schema.instance}} (this will be the same instance as is 
being mutated by the migration thread). Down the line, this is then used for 
lookups in {{Columns.Serializer}} when deserializing the 
{{SerializationHeader}}. This lookup seems to have occurred precisely between 
{{columnMetadata}} being cleared and repopulated, resulting in the unknown 
column error. 

The real solution is CASSANDRA-9425, but in the meantime I believe it is safe 
to make {{columnMetadata}} non-final & volatile and swap in a fresh map during 
rebuild. The only places that the map is actually mutated is in schema 
alterting statements where the new {{CFMetaData}} is constructed . 

I've pushed branches with that change & CI is pending:

||branch||testall||dtest||
|[12098-3.0|https://github.com/beobal/cassandra/tree/12098-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-3.0-dtest]|
|[12098-3.9|https://github.com/beobal/cassandra/tree/12098-3.9]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-3.9-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-3.9-dtest]|
|[12098-trunk|https://github.com/beobal/cassandra/tree/12098-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-trunk-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-12098-trunk-dtest]|


> dtest failure in 
> secondary_indexes_test.TestSecondaryIndexes.test_only_coordinator_chooses_index_for_query
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12098
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12098
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sean McCarthy
>            Assignee: Sam Tunnicliffe
>              Labels: dtest
>             Fix For: 3.0.x, 3.x
>
>         Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log, node3.log, node3_debug.log, node3_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_offheap_dtest/273/testReport/secondary_indexes_test/TestSecondaryIndexes/test_only_coordinator_chooses_index_for_query
> Failed on CassCI build trunk_offheap_dtest #273
> {code}
> Standard Output
> Unexpected error in node1 log, error: 
> ERROR [MessagingService-Incoming-/127.0.0.3] 2016-06-26 08:11:32,185 
> CassandraDaemon.java:219 - Exception in thread 
> Thread[MessagingService-Incoming-/127.0.0.3,5,main]
> java.lang.RuntimeException: Unknown column b during deserialization
>       at 
> org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:433) 
> ~[main/:na]
>       at 
> org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:407)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:192)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:668)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:642)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:349)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:368)
>  ~[main/:na]
>       at 
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:305)
>  ~[main/:na]
>       at org.apache.cassandra.net.MessageIn.read(MessageIn.java:114) 
> ~[main/:na]
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:190)
>  ~[main/:na]
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
>  ~[main/:na]
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
>  ~[main/:na]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to