[jira] [Created] (CASSANDRA-18589) NPE during reads after complex column drop

Jakub Zytka (Jira) Tue, 13 Jun 2023 03:13:05 -0700

Jakub Zytka created CASSANDRA-18589:
---------------------------------------


             Summary: NPE during reads after complex column drop
                 Key: CASSANDRA-18589
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18589
             Project: Cassandra
          Issue Type: Bug
          Components: Local/SSTable
            Reporter: Jakub Zytka
            Assignee: Jakub Zytka


When writing data in parallel with dropping a complex column, the subsequent 
reads may fail with NPE until the affected sstable is compacted. 

 

The scenario leading to NPE is as follows: there exists a row which contains 
data for a complex column that is now dropped. There are no other complex 
columns. The removed column is not skipped during deserialization of the row 
(ColumnFilter is not aware of dropped columns).

At the same time, {{Row$Merger$ColumnDataReducer}} is not aware of existence of 
a complex column ({{{}hasComplex==false{}}}) and thus doesn't have a builder 
for complex data, eventually yielding NPE when processing said complex column 
(backtrace from 3.11):

{{ERROR [ReadStage-2] node2 2023-06-13 11:00:46,756 Uncaught exception on 
thread Thread[ReadStage-2,5,node2]}}
{{java.lang.RuntimeException: java.lang.NullPointerException}}
{{        at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2777)}}
{{        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
{{        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)}}
{{        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService}}
{{.java:134)}}
{{        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)}}
{{        at java.lang.Thread.run(Thread.java:748)}}
{{Caused by: java.lang.NullPointerException: null}}
{{        at 
org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:789)}}
{{        at 
org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:726)}}
{{        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217)}}
{{        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156)}}
{{        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
{{        at org.apache.cassandra.db.rows.Row$Merger.merge(Row.java:703)}}
{{        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.}}
{{java:587)}}
{{        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:551)}}
{{        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217)}}
{{        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156)}}
{{        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
{{        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:533)}}
{{        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:390)}}
{{        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
{{        at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)}}
{{        at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)}}
{{        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
{{        at 
org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)}}
{{        at 
org.apache.cassandra.db.transform.UnfilteredRows.isEmpty(UnfilteredRows.java:74)}}
{{        at 
org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:75)}}
{{        at 
org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:26)}}
{{        at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)}}
{{        at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:305)}}
{{        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187)}}
{{        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180)}}
{{        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176)}}
{{        at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76)}}
{{        at 
org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:360)}}
{{        at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2007)}}
{{        at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2773)}}

The NPE problem races with another problem in that scenario (TODO: link the 
issue when created), so running the reproduction test YMMV which one you hit.

 

While it may be tempting to fix the NPE by lazy initialization of the needed 
builder structure et al., it seems that there is an implicit assumption that 
columns like the dropped one should not get into read path machinery at all at 
this point. 

Thus, instead of just fixing the NPE and hoping no other class makes such an 
assumption I intend to instead make the assumption valid by cutting out the 
dropped column as soon as possible (i.e. during deserialization)

I don't know if I need to care about memtable (instead of sstable contents 
only).

I don't think schema agreement etc. is relevant - currently the ColumnFilter 
uses some specific TableMetadata, so if I use the very same TableMetadata as 
the source of dropped column info there should be internal consistency between 
ColumnFilter and the ColumnDataReducer (or potentially, other classes)

Thoughts? [~blerer] [~blambov] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (CASSANDRA-18589) NPE during reads after complex column drop

Reply via email to