[ 
https://issues.apache.org/jira/browse/CASSANDRA-18932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783634#comment-17783634
 ] 

Jacek Lewandowski commented on CASSANDRA-18932:
-----------------------------------------------

I think we nailed down the circumstances at which the problems happen. While 
testing we have seen various effects - inifinite looping during paging, silent 
data loss, corruption errors while reading. It seems like the repeated pattern 
for those problems is that:
- there are more than one primary index blocks
- the last item after non-last index block is created is 
{{INCL_END_EXCL_START_BOUNDARY}} tombstone marker

The failure is not related to:
- multiple partition key columns
- multiple clustering key columns
- paging
- clustering order

So far we can see that the bug is in the big table iteration which uses primary 
index. When the index block is switched, the unfiltered deserializer is 
unnecessarily reset. This is my fault which I introduced in CASSANDRA-17056 
(SSTable format API) and the fix is trivial. However, the problem revealed some 
holes in our test coverage and we need to address that as well.


> Harry-found CorruptSSTableException / RT Closer issue when reading entire 
> partition
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18932
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18932
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/SSTable
>            Reporter: Alex Petrov
>            Assignee: Maxim Muzafarov
>            Priority: Normal
>             Fix For: 5.0-beta, 5.0.x, 5.x
>
>         Attachments: node1_.zip, operation.log.zip, screenshot-1.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While testing some new machinery for Harry, I have encountered a new RT 
> closer / SSTable Corruption issue. I have grounds to believe this was 
> introduced during the last year.
> Issue seems to happen because of intricate interleaving of flushes with 
> writes and deletes.
> {code:java}
> ERROR [ReadStage-2] 2023-10-16 18:47:06,696 JVMStabilityInspector.java:76 - 
> Exception in thread Thread[ReadStage-2,5,SharedPool]
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> RandomAccessReader:BufferManagingRebufferer.Aligned:CompressedChunkReader.Mmap(/Users/ifesdjeen/foss/java/apache-cassandra-4.0/data/data1/harry/table_1-07c35a606c0a11eeae7a4f6ca489eb0c/nc-5-big-Data.db
>  - LZ4Compressor, chunk length 16384, data length 232569)
>         at 
> org.apache.cassandra.io.sstable.AbstractSSTableIterator$AbstractReader.hasNext(AbstractSSTableIterator.java:381)
>         at 
> org.apache.cassandra.io.sstable.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:242)
>         at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95)
>         at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>         at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>         at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
>         at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:376)
>         at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:188)
>         at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:157)
>         at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:534)
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:402)
>         at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>         at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95)
>         at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>         at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>         at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
>         at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:151)
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:101)
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:86)
>         at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:343)
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201)
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186)
>         at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
>         at 
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:346)
>         at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2186)
>         at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2581)
>         at 
> org.apache.cassandra.concurrent.ExecutionFailure$2.run(ExecutionFailure.java:163)
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:143)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829)
>         Suppressed: java.lang.IllegalStateException: PROCESSED 
> UnfilteredRowIterator for harry.table_1 (key: 
> ZinzDdUuABgDknItABgDknItABgDknItXEFrgBnOmPmPylWrwXHqjBHgeQrGfnZd1124124583:ZinzDdUuABgDknItABgDknItABgDknItABgDknItABgDknItzHqchghqCXLhVYKM22215251:3.2758E-41
>  omdt: [deletedAt=564416, localDeletion=1697450085]) has an illegal RT bounds 
> sequence: expected all RTs to be closed, but the last one is open
>                 at 
> org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.ise(RTBoundValidator.java:117)
>                 at 
> org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.onPartitionClose(RTBoundValidator.java:112)
>                 at 
> org.apache.cassandra.db.transform.BaseRows.runOnClose(BaseRows.java:91)
>                 at 
> org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95)
>                 at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:341)
>                 ... 10 common frames omitted
> Caused by: java.io.IOException: Invalid Columns subset bytes; too many bits 
> set:10
>         at 
> org.apache.cassandra.db.Columns$Serializer.deserializeSubset(Columns.java:578)
>         at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(UnfilteredSerializer.java:604)
>         at 
> org.apache.cassandra.db.UnfilteredDeserializer.readNext(UnfilteredDeserializer.java:143)
>         at 
> org.apache.cassandra.io.sstable.format.big.SSTableIterator$ForwardIndexedReader.computeNext(SSTableIterator.java:175)
>         at 
> org.apache.cassandra.io.sstable.AbstractSSTableIterator$ForwardReader.hasNextInternal(AbstractSSTableIterator.java:533)
>         at 
> org.apache.cassandra.io.sstable.AbstractSSTableIterator$AbstractReader.hasNext(AbstractSSTableIterator.java:368)
>         ... 31 common frames omitted {code}
>  
> Unfortunately, harry branch is not ready for release yet. That said, I have a 
> snapshot pinned for quick repro and can share SSTables that will easily repro 
> the issue if there are any takers.
> To reproduce:
> {code:java}
> paging 1; # make sure to set page size to 1 (it breaks with other page sizes, 
> too)
> select * from harry.table_1;
> {code}
> Please also make sure to modify snitch to set rack/dc:
> {code:java}
> +    public static final String DATA_CENTER_NAME = "datacenter0";
> +    public static final String RACK_NAME = "rack0";
> {code}
> And set directories in cassandra.yaml: 
> {code:java}
> +data_file_directories:
> + - /your/path/data/data0
> + - /your/path/data/data1
> + -/your/path/data/data2{code}
> SHA for repro: 865d7c30e4755e74c4e4d26205a7aed4cfb55710



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to