[
https://issues.apache.org/jira/browse/CASSANDRA-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jakub Zytka updated CASSANDRA-18589:
------------------------------------
Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable
Corruption / Loss(12986)
Complexity: Normal
Discovered By: Adhoc Test
Severity: Normal
Status: Open (was: Triage Needed)
> NPE during reads after complex column drop
> ------------------------------------------
>
> Key: CASSANDRA-18589
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18589
> Project: Cassandra
> Issue Type: Bug
> Components: Local/SSTable
> Reporter: Jakub Zytka
> Assignee: Jakub Zytka
> Priority: Normal
>
> When writing data in parallel with dropping a complex column, the subsequent
> reads may fail with NPE until the affected sstable is compacted.
>
> The scenario leading to NPE is as follows: there exists a row which contains
> data for a complex column that is now dropped. There are no other complex
> columns. The removed column is not skipped during deserialization of the row
> (ColumnFilter is not aware of dropped columns).
> At the same time, {{Row$Merger$ColumnDataReducer}} is not aware of existence
> of a complex column ({{{}hasComplex==false{}}}) and thus doesn't have a
> builder for complex data, eventually yielding NPE when processing said
> complex column (backtrace from 3.11):
> {{ERROR [ReadStage-2] node2 2023-06-13 11:00:46,756 Uncaught exception on
> thread Thread[ReadStage-2,5,node2]}}
> {{java.lang.RuntimeException: java.lang.NullPointerException}}
> {{ at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2777)}}
> {{ at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
> {{ at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)}}
> {{ at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService}}
> {{.java:134)}}
> {{ at
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)}}
> {{ at java.lang.Thread.run(Thread.java:748)}}
> {{Caused by: java.lang.NullPointerException: null}}
> {{ at
> org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:789)}}
> {{ at
> org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:726)}}
> {{ at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217)}}
> {{ at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156)}}
> {{ at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
> {{ at org.apache.cassandra.db.rows.Row$Merger.merge(Row.java:703)}}
> {{ at
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.}}
> {{java:587)}}
> {{ at
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:551)}}
> {{ at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217)}}
> {{ at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156)}}
> {{ at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
> {{ at
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:533)}}
> {{ at
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:390)}}
> {{ at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
> {{ at
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)}}
> {{ at
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)}}
> {{ at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)}}
> {{ at
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)}}
> {{ at
> org.apache.cassandra.db.transform.UnfilteredRows.isEmpty(UnfilteredRows.java:74)}}
> {{ at
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:75)}}
> {{ at
> org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:26)}}
> {{ at
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)}}
> {{ at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:305)}}
> {{ at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187)}}
> {{ at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180)}}
> {{ at
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176)}}
> {{ at
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76)}}
> {{ at
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:360)}}
> {{ at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2007)}}
> {{ at
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2773)}}
> The NPE problem races with another problem in that scenario (TODO: link the
> issue when created), so running the reproduction test YMMV which one you hit.
>
> While it may be tempting to fix the NPE by lazy initialization of the needed
> builder structure et al., it seems that there is an implicit assumption that
> columns like the dropped one should not get into read path machinery at all
> at this point.
> Thus, instead of just fixing the NPE and hoping no other class makes such an
> assumption I intend to instead make the assumption valid by cutting out the
> dropped column as soon as possible (i.e. during deserialization)
> I don't know if I need to care about memtable (instead of sstable contents
> only).
> I don't think schema agreement etc. is relevant - currently the ColumnFilter
> uses some specific TableMetadata, so if I use the very same TableMetadata as
> the source of dropped column info there should be internal consistency
> between ColumnFilter and the ColumnDataReducer (or potentially, other classes)
> Thoughts? [~blerer] [~blambov]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]