[
https://issues.apache.org/jira/browse/CASSANDRA-15284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764383#comment-17764383
]
Sebastian Marsching commented on CASSANDRA-15284:
-------------------------------------------------
I did a bit more digging and I think that this bug got originally introduced in
Cassandra 2.1.0: Before the {{uncompressedSize}} (which was called
{{originalSize}} at this point in time) was only used for calculating
compression statistics, but since 2.1.0 it has also been used when calling
methods of the {{{}MetadataWriter{}}}, where it matters whether this number is
correct.
As this bug is only triggered when using {{resetAndTruncate}} and this method
is only used by {{SSTableRewriter}} (which is only used by {{{}Scrubber{}}}),
it only appears when scrubbing SSTables. In addition to that, the rewind causes
by {{resetAndTruncate}} has to be large enough for the wrong data size to
extend beyond the last chunk.
Interestingly, I do not see a good rason why {{uncompressedSize}} is even
needed: When not hitting this bug, it should always be the same as
{{{}bufferOffset{}}}, the only exception being inside {{flushData}} between
updating {{uncompressedSize}} and returning, because bufferOffset is only
updated after {{flushData}} returns.
Therefore, the simplest and least invasive fix is using {{bufferOffset}}
instead of {{uncompressedSize}} in {{{}CompressedSequentialWriter.open{}}}.
However, {{uncompressedSize}} is also used when updating {{lastFlushOffset}}
and for compression statistics. Luckily, it seems like {{lastFlushOffset}} is
only used by {{{}BigTableWriter.IndexWriter{}}}, which never uses the
{{{}CompressedSequentialWriter{}}}, so this shouldn’t cause any actual problems.
BTW, {{compressedSize}} has the same problem (not rolled back in
{{{}resetAndTruncate{}}}), but as it is only used for compression statistics,
it does not cause any issues.
The nicer, but more invasive fix would be removing {{compressedSize}} and
{{uncompressedSize}} completely. As written earlier, uncompressedSize can
easily be replaced with {{{}bufferOffset{}}}. {{compressedSize}} is very
similar to {{{}chunkOffset{}}}, the only difference being that {{chunkOffset}}
also counts the size occupied by the checksum for each chunk, while
{{compressedSize}} does not. Therefore, I think that it would be okay to
replace {{compressedSize}} with {{{}chunkOffset{}}}. It would mean that the
compressed size used for statistics would be a bit larger, but it would also
ensure that the statistics remain correct after {{resetAndTruncate}} has been
called. If the old method of counting should be retained, we could
{{chunkOffset - chunkCount * 4L}} for the compression statistics.
Reproducing the bug is quite easy: Simply go to
{{CompressedSequentialWriterTest.resetAndTruncate}} and insert the following
two lines anywhere after {{{}writer.resetAndTruncate(pos){}}}:
{{ CompressionMetadata metadata = ((CompressedSequentialWriter)
writer).open(0);}}
{{ metadata.chunkFor(metadata.dataLength - 1);}}
This causes the {{AssertionError}} when running the test. When inserting the
two lines before {{{}resetAndTruncate{}}}, there is no exception, which shows
that this problem is caused by this method.
Unfortunately, I think that I found another bug while investigating this bug:
The {{CompressedSequentialWriter.crcMetadata}} object gets updated with the
uncompressed data as it is written. This has the consequence that the checksum
written to the “-Digest.crc32” file will be wrong when {{resetAndTruncate}} has
been called. In order for the checksum to be correct, the state of the
{{ChecksumWriter}} would have to be reset in {{{}resetAndTruncate{}}}, but this
is not easy because the underlying {{CRC32}} class does not provide any
constructor or method for initializing the internal state, so there is no way
to reset it to a previous state.
So, in order to fix this problem we would either have to use or own
implementation of the CRC-32 algorithm or we would have to calculate the byte
sequence that gets us back to a certain state (which is possible, but not a
nice solution, see
https://stackoverflow.com/questions/38981523/crc32-change-initial-value for a
discussion).
Until this problem is fixed, scrubbing has to be considered broken.
> AssertionError while scrubbing sstable
> --------------------------------------
>
> Key: CASSANDRA-15284
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15284
> Project: Cassandra
> Issue Type: Bug
> Components: Feature/Compression
> Reporter: Gianluigi Tiesi
> Priority: Normal
> Fix For: 3.11.x, 4.0.x, 4.1.x
>
> Attachments: assert-comp-meta.diff
>
>
> I've got a damaged data file but while trying to run scrub (online or
> offline) I always get this
> error:
>
> {code:java}
> -- StackTrace --
> java.lang.AssertionError
> at
> org.apache.cassandra.io.compress.CompressionMetadata$Chunk.<init>(CompressionMetadata.java:474)
> at
> org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:239)
> at
> org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:163)
> at
> org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:73)
> at
> org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:61)
> at
> org.apache.cassandra.io.util.MmappedRegions.map(MmappedRegions.java:104)
> at
> org.apache.cassandra.io.util.FileHandle$Builder.complete(FileHandle.java:362)
> at
> org.apache.cassandra.io.util.FileHandle$Builder.complete(FileHandle.java:331)
> at
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.openFinal(BigTableWriter.java:336)
> at
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.openFinalEarly(BigTableWriter.java:318)
> at
> org.apache.cassandra.io.sstable.SSTableRewriter.switchWriter(SSTableRewriter.java:322)
> at
> org.apache.cassandra.io.sstable.SSTableRewriter.doPrepare(SSTableRewriter.java:370)
> at
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.prepareToCommit(Transactional.java:173)
> at
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish(Transactional.java:184)
> at
> org.apache.cassandra.io.sstable.SSTableRewriter.finish(SSTableRewriter.java:357)
> at
> org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:291)
> at
> org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:1010)
> at
> org.apache.cassandra.db.compaction.CompactionManager.access$200(CompactionManager.java:83)
> at
> org.apache.cassandra.db.compaction.CompactionManager$3.execute(CompactionManager.java:391)
> at
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:312)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> At the moment I've moved away the corrupted file, If you need more info fell
> free to ask
>
> According to the source
> [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/io/compress/CompressionMetadata.java#L474]
> looks like the requested chung length is <= 0
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]