[ 
https://issues.apache.org/jira/browse/CASSANDRA-18932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783639#comment-17783639
 ] 

Alex Petrov commented on CASSANDRA-18932:
-----------------------------------------

While it is probably benefitial to run tests with different column index sizes, 
I think we can rely on Harry for fuzzing perturbations. I have stumbled upon 
this issue while testing new Harry subsystem which allows more targeted tests, 
better API and potentially quicker turnaround, and it has so far caught all 
mentioned problems: RT corruption, data loss, infinite iteration, corruption in 
several different places. While minimal repros are useful for reviewers, and 
should be attached to the patch, they are not likely to be useful for 
regression testing, since the chance we will re-introduce exactly same issue is 
relatively small. We should rely on fuzzing more: it finds issues quickly, 
covers all schemas, and is easy to configure and extend.

> Harry-found CorruptSSTableException / RT Closer issue when reading entire 
> partition
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18932
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18932
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/SSTable
>            Reporter: Alex Petrov
>            Assignee: Maxim Muzafarov
>            Priority: Normal
>             Fix For: 5.0-beta, 5.0.x, 5.x
>
>         Attachments: node1_.zip, operation.log.zip, screenshot-1.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While testing some new machinery for Harry, I have encountered a new RT 
> closer / SSTable Corruption issue. I have grounds to believe this was 
> introduced during the last year.
> Issue seems to happen because of intricate interleaving of flushes with 
> writes and deletes.
> {code:java}
> ERROR [ReadStage-2] 2023-10-16 18:47:06,696 JVMStabilityInspector.java:76 - 
> Exception in thread Thread[ReadStage-2,5,SharedPool]
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> RandomAccessReader:BufferManagingRebufferer.Aligned:CompressedChunkReader.Mmap(/Users/ifesdjeen/foss/java/apache-cassandra-4.0/data/data1/harry/table_1-07c35a606c0a11eeae7a4f6ca489eb0c/nc-5-big-Data.db
>  - LZ4Compressor, chunk length 16384, data length 232569)
>         at 
> org.apache.cassandra.io.sstable.AbstractSSTableIterator$AbstractReader.hasNext(AbstractSSTableIterator.java:381)
>         at 
> org.apache.cassandra.io.sstable.AbstractSSTableIterator.hasNext(AbstractSSTableIterator.java:242)
>         at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95)
>         at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>         at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>         at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
>         at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:376)
>         at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:188)
>         at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:157)
>         at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:534)
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:402)
>         at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>         at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:95)
>         at 
> org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
>         at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
>         at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
>         at 
> org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:151)
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:101)
>         at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:86)
>         at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:343)
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201)
>         at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186)
>         at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
>         at 
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:346)
>         at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2186)
>         at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2581)
>         at 
> org.apache.cassandra.concurrent.ExecutionFailure$2.run(ExecutionFailure.java:163)
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:143)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829)
>         Suppressed: java.lang.IllegalStateException: PROCESSED 
> UnfilteredRowIterator for harry.table_1 (key: 
> ZinzDdUuABgDknItABgDknItABgDknItXEFrgBnOmPmPylWrwXHqjBHgeQrGfnZd1124124583:ZinzDdUuABgDknItABgDknItABgDknItABgDknItABgDknItzHqchghqCXLhVYKM22215251:3.2758E-41
>  omdt: [deletedAt=564416, localDeletion=1697450085]) has an illegal RT bounds 
> sequence: expected all RTs to be closed, but the last one is open
>                 at 
> org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.ise(RTBoundValidator.java:117)
>                 at 
> org.apache.cassandra.db.transform.RTBoundValidator$RowsTransformation.onPartitionClose(RTBoundValidator.java:112)
>                 at 
> org.apache.cassandra.db.transform.BaseRows.runOnClose(BaseRows.java:91)
>                 at 
> org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:95)
>                 at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:341)
>                 ... 10 common frames omitted
> Caused by: java.io.IOException: Invalid Columns subset bytes; too many bits 
> set:10
>         at 
> org.apache.cassandra.db.Columns$Serializer.deserializeSubset(Columns.java:578)
>         at 
> org.apache.cassandra.db.rows.UnfilteredSerializer.deserializeRowBody(UnfilteredSerializer.java:604)
>         at 
> org.apache.cassandra.db.UnfilteredDeserializer.readNext(UnfilteredDeserializer.java:143)
>         at 
> org.apache.cassandra.io.sstable.format.big.SSTableIterator$ForwardIndexedReader.computeNext(SSTableIterator.java:175)
>         at 
> org.apache.cassandra.io.sstable.AbstractSSTableIterator$ForwardReader.hasNextInternal(AbstractSSTableIterator.java:533)
>         at 
> org.apache.cassandra.io.sstable.AbstractSSTableIterator$AbstractReader.hasNext(AbstractSSTableIterator.java:368)
>         ... 31 common frames omitted {code}
>  
> Unfortunately, harry branch is not ready for release yet. That said, I have a 
> snapshot pinned for quick repro and can share SSTables that will easily repro 
> the issue if there are any takers.
> To reproduce:
> {code:java}
> paging 1; # make sure to set page size to 1 (it breaks with other page sizes, 
> too)
> select * from harry.table_1;
> {code}
> Please also make sure to modify snitch to set rack/dc:
> {code:java}
> +    public static final String DATA_CENTER_NAME = "datacenter0";
> +    public static final String RACK_NAME = "rack0";
> {code}
> And set directories in cassandra.yaml: 
> {code:java}
> +data_file_directories:
> + - /your/path/data/data0
> + - /your/path/data/data1
> + -/your/path/data/data2{code}
> SHA for repro: 865d7c30e4755e74c4e4d26205a7aed4cfb55710



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to