[jira] [Updated] (CASSANDRA-18589) NPE during reads after complex column drop

Jakub Zytka (Jira) Tue, 13 Jun 2023 03:18:04 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jakub Zytka updated CASSANDRA-18589:
------------------------------------
     Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable 
Corruption / Loss(12986)
       Complexity: Normal
    Discovered By: Adhoc Test
         Severity: Normal
           Status: Open  (was: Triage Needed)

> NPE during reads after complex column drop
> ------------------------------------------
>
>                 Key: CASSANDRA-18589
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18589
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/SSTable
>            Reporter: Jakub Zytka
>            Assignee: Jakub Zytka
>            Priority: Normal
>
> When writing data in parallel with dropping a complex column, the subsequent 
> reads may fail with NPE until the affected sstable is compacted. 
>  
> The scenario leading to NPE is as follows: there exists a row which contains 
> data for a complex column that is now dropped. There are no other complex 
> columns. The removed column is not skipped during deserialization of the row 
> (ColumnFilter is not aware of dropped columns).
> At the same time, {{Row$Merger$ColumnDataReducer}} is not aware of existence 
> of a complex column ({{{}hasComplex==false{}}}) and thus doesn't have a 
> builder for complex data, eventually yielding NPE when processing said 
> complex column (backtrace from 3.11):
> {{ERROR [ReadStage-2] node2 2023-06-13 11:00:46,756 Uncaught exception on 
> thread Thread[ReadStage-2,5,node2]}}
> {{java.lang.RuntimeException: java.lang.NullPointerException}}
> {{        at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2777)}}
> {{        at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
> {{        at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)}}
> {{        at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService}}
> {{.java:134)}}
> {{        at 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)}}
> {{        at java.lang.Thread.run(Thread.java:748)}}
> {{Caused by: java.lang.NullPointerException: null}}
> {{        at 
> org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:789)}}
> {{        at 
> org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:726)}}
> {{        at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217)}}
> {{        at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156)}}
> {{        at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
> {{        at org.apache.cassandra.db.rows.Row$Merger.merge(Row.java:703)}}
> {{        at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.}}
> {{java:587)}}
> {{        at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:551)}}
> {{        at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217)}}
> {{        at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156)}}
> {{        at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
> {{        at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:533)}}
> {{        at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:390)}}
> {{        at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
> {{        at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)}}
> {{        at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)}}
> {{        at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
> {{        at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)}}
> {{        at 
> org.apache.cassandra.db.transform.UnfilteredRows.isEmpty(UnfilteredRows.java:74)}}
> {{        at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:75)}}
> {{        at 
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:26)}}
> {{        at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)}}
> {{        at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:305)}}
> {{        at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187)}}
> {{        at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180)}}
> {{        at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176)}}
> {{        at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76)}}
> {{        at 
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:360)}}
> {{        at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2007)}}
> {{        at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2773)}}
> The NPE problem races with another problem in that scenario (TODO: link the 
> issue when created), so running the reproduction test YMMV which one you hit.
>  
> While it may be tempting to fix the NPE by lazy initialization of the needed 
> builder structure et al., it seems that there is an implicit assumption that 
> columns like the dropped one should not get into read path machinery at all 
> at this point. 
> Thus, instead of just fixing the NPE and hoping no other class makes such an 
> assumption I intend to instead make the assumption valid by cutting out the 
> dropped column as soon as possible (i.e. during deserialization)
> I don't know if I need to care about memtable (instead of sstable contents 
> only).
> I don't think schema agreement etc. is relevant - currently the ColumnFilter 
> uses some specific TableMetadata, so if I use the very same TableMetadata as 
> the source of dropped column info there should be internal consistency 
> between ColumnFilter and the ColumnDataReducer (or potentially, other classes)
> Thoughts? [~blerer] [~blambov] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-18589) NPE during reads after complex column drop

Reply via email to