[ 
https://issues.apache.org/jira/browse/CASSANDRA-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920810#comment-16920810
 ] 

Phil O Conduin commented on CASSANDRA-15274:
--------------------------------------------

Hi,


We managed to remove the CRC check from the code and build. When we do a 
sstable2json on a corrupt file we are not seeing an issue with CRC.
This time it is not CRC check, but exception during an attempt to decompress 
the chunk, so I think we got the answer to our question - it is not just CRC 
check problem.

 

Another area of investigation of this issue, we decided to create a script that 
generated MD5 checksums against all sstable files. This script runs from cron 
twice per day and logs checksums of all sstable files.
We capture the md5 and then compare it over the lifetime fo the file. We have 
proved that the md5 checksum number is not changing. This would indicate a 
possible bug in Cassandra at time of compacting/writing the file.

 

 

Taking the latest file for example:

*First reported in cassandra log Sep 01 08:39:48*

{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
ERROR 07:39:48 Failed creating a merkle tree for [repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on 
KeyspaceMetadata/CF_ConversationIndex1, 
(-2320162195562336336,-2318312110429971422]], /10.2.41.38 (see log for 
details)}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
INFO 07:39:48 [repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98] Received merkle 
tree for CF_ConversationIndex1 from /10.2.41.38}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
WARN 07:39:48 [repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98] 
CF_ConversationIndex1 sync failed}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
INFO 07:39:48 [repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98] Requesting merkle 
trees for CF_RecentIndex (to [/10.2.41.34, /10.2.41.48, /10.2.57.54, 
/10.2.57.46, /10.2.57.12, /10.2.41.38])}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
ERROR 07:39:48 Exception in thread Thread[RepairJobTask:24,5,main]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
org.apache.cassandra.exceptions.RepairException: [repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on 
KeyspaceMetadata/CF_ConversationIndex1, 
(-2320162195562336336,-2318312110429971422]] Validation failed in /10.2.41.38}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:178)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:478)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:174)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.lang.Thread.run(Thread.java:748) [na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
ERROR 07:39:48 Exception in thread Thread[ValidationExecutor:53,1,main]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
org.apache.cassandra.io.FSReadError: 
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:324)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:132)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:92)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 ~[guava-16.0.jar:na]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
~[guava-16.0.jar:na]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(SSTableIdentityIterator.java:169)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 ~[guava-16.0.jar:na]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
~[guava-16.0.jar:na]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
com.google.common.collect.Iterators$7.computeNext(Iterators.java:645) 
~[guava-16.0.jar:na]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 ~[guava-16.0.jar:na]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
~[guava-16.0.jar:na]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(ColumnIndex.java:174)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.db.compaction.LazilyCompactedRow.update(LazilyCompactedRow.java:187)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.Validator.rowHash(Validator.java:201) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.Validator.add(Validator.java:150) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1166)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:76)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:736)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.lang.Thread.run(Thread.java:748) [na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:216)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:226)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:42)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:352)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: ... 
26 common frames omitted}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
Caused by: org.apache.cassandra.io.compress.CorruptBlockException: 
(/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db):
 corruption detected, chunk at 20164080027 of length 35362.}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:185)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: ... 
29 common frames omitted}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
ERROR 07:39:48 Exception in thread Thread[Repair#5:1,5,RMI Runtime]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
com.google.common.util.concurrent.UncheckedExecutionException: 
org.apache.cassandra.exceptions.RepairException: [repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on 
KeyspaceMetadata/CF_ConversationIndex1, 
(-2320162195562336336,-2318312110429971422]] Validation failed in /10.2.41.38}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1387)
 ~[guava-16.0.jar:na]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1373) 
~[guava-16.0.jar:na]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.RepairJob.run(RepairJob.java:167) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
Caused by: org.apache.cassandra.exceptions.RepairException: [repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on 
KeyspaceMetadata/CF_ConversationIndex1, 
(-2320162195562336336,-2318312110429971422]] Validation failed in /10.2.41.38}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:178)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:478)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:174)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: ... 
3 common frames omitted}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
INFO 07:39:48 Writing Memtable-CF_RecentIndex@943118133(0.055KiB serialized 
bytes, 2 ops, 0%/0% of on/off-heap limit)}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
ERROR 07:39:48 Stopping gossiper}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
WARN 07:39:48 Stopping gossip by operator request}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
INFO 07:39:48 Announcing shutdown}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
INFO 07:39:48 Node /10.2.41.38 state jump to shutdown}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
ERROR 07:39:48 [repair #fb265fa0-cc8a-11e9-9296-5b5fb0093f98] Session completed 
with the following error}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
org.apache.cassandra.exceptions.RepairException: [repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on 
KeyspaceMetadata/CF_ConversationIndex1, 
(-2320162195562336336,-2318312110429971422]] Validation failed in /10.2.41.38}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:178)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:478)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:174)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.lang.Thread.run(Thread.java:748) [na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
ERROR 07:39:48 Repair session fb265fa0-cc8a-11e9-9296-5b5fb0093f98 for range 
(-2320162195562336336,-2318312110429971422] failed with error [repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on 
KeyspaceMetadata/CF_ConversationIndex1, 
(-2320162195562336336,-2318312110429971422]] Validation failed in /10.2.41.38}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
org.apache.cassandra.exceptions.RepairException: [repair 
#fb265fa0-cc8a-11e9-9296-5b5fb0093f98 on 
KeyspaceMetadata/CF_ConversationIndex1, 
(-2320162195562336336,-2318312110429971422]] Validation failed in /10.2.41.38}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:178)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:478)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:174)
 ~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-2.2.13.jar:2.2.13]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: at 
java.lang.Thread.run(Thread.java:748) [na:1.8.0_172]}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
INFO 07:39:48 Not a global repair, will not do anticompaction}}
{{Sep 01 08:39:48 sa-ref-met-009.btmx-ref.synchronoss.net cassandra[16223]: 
INFO 07:39:48 Repair command #5 finished in 4 minutes 58 seconds}}

 


*--File created Sep 1 07:34*

{{root@sa-ref-met-009 0 11:17:11 ~ # ls -l 
/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db}}
{{-rw-r--r-- 1 cass_b cass_b 174312179967 Sep 1 07:34 
/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db}}


{{*--md5 logs*}}
{{-rw-r--r-- 1 root root 780422 Sep 1 18:22 
md5.sa-ref-met-009.btmx-ref.synchronoss.net.20190901180001.log}}
{{{color:#FF0000}587559809e496928314b4ff4c71e27dd{color} 
/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db}}
{{-rw-r--r-- 1 root root 780422 Sep 2 06:22 
md5.sa-ref-met-009.btmx-ref.synchronoss.net.20190902060001.log}}
{{{color:#FF0000}587559809e496928314b4ff4c71e27dd{color} 
/data/ssd2/data/KeyspaceMetadata/CF_ConversationIndex1-1e77be609c7911e8ac12255de1fb512a/lb-26352-big-Data.db}}


*--checksum the same*

 

We cannot analyse a corrupt file to see exactly what is happening, would it be 
possible that we can send *you/Cassandra dev team* an sstable file pack and you 
might have some way to analyse the file to see exactly what is happening.

> Multiple Corrupt datafiles across entire environment 
> -----------------------------------------------------
>
>                 Key: CASSANDRA-15274
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15274
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Compaction
>            Reporter: Phil O Conduin
>            Priority: Normal
>
> Cassandra Version: 2.2.13
> PRE-PROD environment.
>  * 2 datacenters.
>  * 9 physical servers in each datacenter - (_Cisco UCS C220 M4 SFF_)
>  * 4 Cassandra instances on each server (cass_a, cass_b, cass_c, cass_d)
>  * 72 Cassandra instances across the 2 data centres, 36 in site A, 36 in site 
> B.
> We also have 2 Reaper Nodes we use for repair.  One reaper node in each 
> datacenter each running with its own Cassandra back end in a cluster together.
> OS Details [Red Hat Linux]
> cass_a@x 0 10:53:01 ~ $ uname -a
> Linux x 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 
> x86_64 x86_64 GNU/Linux
> cass_a@x 0 10:57:31 ~ $ cat /etc/*release
> NAME="Red Hat Enterprise Linux Server"
> VERSION="7.6 (Maipo)"
> ID="rhel"
> Storage Layout 
> cass_a@xx 0 10:46:28 ~ $ df -h
> Filesystem                         Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lv_root            20G  2.2G   18G  11% /
> devtmpfs                            63G     0   63G   0% /dev
> tmpfs                               63G     0   63G   0% /dev/shm
> tmpfs                               63G  4.1G   59G   7% /run
> tmpfs                               63G     0   63G   0% /sys/fs/cgroup
> >> 4 cassandra instances
> /dev/sdd                           1.5T  802G  688G  54% /data/ssd4
> /dev/sda                           1.5T  798G  692G  54% /data/ssd1
> /dev/sdb                           1.5T  681G  810G  46% /data/ssd2
> /dev/sdc                           1.5T  558G  932G  38% /data/ssd3
> Cassandra load is about 200GB and the rest of the space is snapshots
> CPU
> cass_a@x 127 10:58:47 ~ $ lscpu | grep -E '^Thread|^Core|^Socket|^CPU\('
> CPU(s):                64
> Thread(s) per core:    2
> Core(s) per socket:    16
> Socket(s):             2
> *Description of problem:*
> During repair of the cluster, we are seeing multiple corruptions in the log 
> files on a lot of instances.  There seems to be no pattern to the corruption. 
>  It seems that the repair job is finding all the corrupted files for us.  The 
> repair will hang on the node where the corrupted file is found.  To fix this 
> we remove/rename the datafile and bounce the Cassandra instance.  Our 
> hardware/OS team have stated there is no problem on their side.  I do not 
> believe it the repair causing the corruption. 
>  
> So let me give you an example of a corrupted file and maybe someone might be 
> able to work through it with me?
> When this corrupted file was reported in the log it looks like it was the 
> repair that found it.
> $ journalctl -u cassmeta-cass_b.service --since "2019-08-07 22:25:00" --until 
> "2019-08-07 22:45:00"
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Writing 
> Memtable-compactions_in_progress@830377457(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Failed creating a merkle 
> tree for [repair #9587a200-b95a-11e9-8920-9f72868b8375 on KeyspaceMetadata/x, 
> (-1476350953672479093,-1474461
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Exception in thread 
> Thread[ValidationExecutor:825,1,main]
> Aug 07 22:30:33 cassandra[34611]: org.apache.cassandra.io.FSReadError: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /x/ssd2/data/KeyspaceMetadata/x-1e453cb0
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:365)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:361) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:340)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:382)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.deserialize(AbstractCType.java:366)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:81)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>  ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
> ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.sstable.SSTableIdentityIterator.hasNext(SSTableIdentityIterator.java:169)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>  ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
> ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.Iterators$7.computeNext(Iterators.java:645) 
> ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>  ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
> ~[guava-16.0.jar:na]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.ColumnIndex$Builder.buildForCompaction(ColumnIndex.java:174)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.compaction.LazilyCompactedRow.update(LazilyCompactedRow.java:187)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.repair.Validator.rowHash(Validator.java:201) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.repair.Validator.add(Validator.java:150) 
> ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1166)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:76)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:736)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_172]
> Aug 07 22:30:33 cassandra[34611]: at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[na:1.8.0_172]
> Aug 07 22:30:33 cassandra[34611]: at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_172]
> Aug 07 22:30:33 cassandra[34611]: at java.lang.Thread.run(Thread.java:748) 
> [na:1.8.0_172]
> Aug 07 22:30:33 cassandra[34611]: Caused by: 
> org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: 
> /data/ssd2/data/KeyspaceMetadata/x-x/l
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:216)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:226)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.compress.CompressedThrottledReader.reBuffer(CompressedThrottledReader.java:42)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:352)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: ... 27 common frames omitted
> Aug 07 22:30:33 cassandra[34611]: Caused by: 
> org.apache.cassandra.io.compress.CorruptBlockException: 
> (/data/ssd2/data/KeyspaceMetadata/x-x/lb-26203-big
> Aug 07 22:30:33 cassandra[34611]: at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBufferMmap(CompressedRandomAccessReader.java:185)
>  ~[apache-cassandra-2.2.13.jar:2.2.13]
> Aug 07 22:30:33 cassandra[34611]: ... 30 common frames omitted
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Not a global repair, will 
> not do anticompaction
> Aug 07 22:30:33 cassandra[34611]: ERROR 21:30:33 Stopping gossiper
> Aug 07 22:30:33 cassandra[34611]: WARN  21:30:33 Stopping gossip by operator 
> request
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Announcing shutdown
> Aug 07 22:30:33 cassandra[34611]: INFO  21:30:33 Node 
> /[10.2.57.37|http://10.2.57.37/]state jump to shutdown
> So I went to the file system to see when this corrupt file was created and it 
> was created on July 30th at 15.55
> root@x 0 01:14:03 ~ # ls -l 
> /data/ssd2/data/KeyspaceMetadata/x-x/lb-26203-big-Data.db
> -rw-r--r-- 1 cass_b cass_b 3182243670 Jul 30 15:55 
> /data/ssd2/data/KeyspaceMetadata/x-x/lb-26203-big-Data.db
> So I checked /var/log/messages for errors around that time
> The only thing that stands out to me is the message "Cannot perform a full 
> major compaction as repaired and unrepaired sstables cannot be compacted 
> together", I'm not sure if this would be an issue though and cause corruption.
> Jul 30 15:55:06 x systemd: Created slice User Slice of root.
> Jul 30 15:55:06 x systemd: Started Session c165280 of user root.
> Jul 30 15:55:06 x audispd: node=x. type=USER_START 
> msg=audit(1564498506.021:457933): pid=17533 uid=0 auid=4294967295 
> ses=4294967295 msg='op=PAM:session_open 
> grantors=pam_keyinit,pam_limits,pam_keyinit,pam_limits,pam_tty_audit,pam_systemd,pam_unix
>  acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=? res=success'
> Jul 30 15:55:06 x systemd: Removed slice User Slice of root.
> Jul 30 15:55:14 x tag_audit_log: type=USER_CMD 
> msg=audit(1564498506.013:457932): pid=17533 uid=509 auid=4294967295 
> ses=4294967295 msg='cwd="/" 
> cmd=2F7573722F7362696E2F69706D692D73656E736F7273202D2D71756965742D6361636865202D2D7364722D63616368652D7265637265617465202D2D696E746572707265742D6F656D2D64617461202D2D6F75747075742D73656E736F722D7374617465202D2D69676E6F72652D6E6F742D617661696C61626C652D73656E736F7273202D2D6F75747075742D73656E736F722D7468726573686F6C6473
>  terminal=? res=success'
> Jul 30 15:55:14 x tag_audit_log: type=USER_START 
> msg=audit(1564498506.021:457933): pid=17533 uid=0 auid=4294967295 
> ses=4294967295 msg='op=PAM:session_open 
> grantors=pam_keyinit,pam_limits,pam_keyinit,pam_limits,pam_tty_audit,pam_systemd,pam_unix
>  acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=? res=success'
> Jul 30 15:55:19 x cassandra: INFO  14:55:19 Writing 
> Memtable-compactions_in_progress@1462227999(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Jul 30 15:55:19 x cassandra: INFO  14:55:19 Cannot perform a full major 
> compaction as repaired and unrepaired sstables cannot be compacted together. 
> These two set of sstables will be compacted separately.
> Jul 30 15:55:19 x cassandra: INFO  14:55:19 Writing 
> Memtable-compactions_in_progress@1198535528(1.002KiB serialized bytes, 57 
> ops, 0%/0% of on/off-heap limit)
> Jul 30 15:55:20 x cassandra: INFO  14:55:20 Writing 
> Memtable-compactions_in_progress@2039409834(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Jul 30 15:55:24 x audispd: node=x. type=USER_LOGOUT 
> msg=audit(1564498524.409:457934): pid=46620 uid=0 auid=464400029 ses=2747 
> msg='op=login id=464400029 exe="/usr/sbin/sshd" hostname=? addr=? 
> terminal=/dev/pts/0 res=success'
> Jul 30 15:55:24 x audispd: node=x. type=USER_LOGOUT 
> msg=audit(1564498524.409:457935): pid=4878 uid=0 auid=464400029 ses=2749 
> msg='op=login id=464400029 exe="/usr/sbin/sshd" hostname=? addr=? 
> terminal=/dev/pts/1 res=success'
> Jul 30 15:55:57 x systemd: Created slice User Slice of root.
> Jul 30 15:55:57 x systemd: Started Session c165288 of user root.
> Jul 30 15:55:57 x audispd: node=x. type=USER_START 
> msg=audit(1564498557.294:457958): pid=19687 uid=0 auid=4294967295 
> ses=4294967295 msg='op=PAM:session_open 
> grantors=pam_keyinit,pam_limits,pam_keyinit,pam_limits,pam_tty_audit,pam_systemd,pam_unix
>  acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=? res=success'
> Jul 30 15:55:57 x audispd: node=x. type=USER_START 
> msg=audit(1564498557.298:457959): pid=19690 uid=0 auid=4294967295 
> ses=4294967295 msg='op=PAM:session_open 
> grantors=pam_keyinit,pam_systemd,pam_keyinit,pam_limits,pam_unix 
> acct="cass_b" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? 
> res=success'
> Jul 30 15:55:58 x systemd: Removed slice User Slice of root.
> Jul 30 15:56:02 x cassandra: INFO  14:56:02 Writing 
> Memtable-compactions_in_progress@1532791194(0.008KiB serialized bytes, 1 ops, 
> 0%/0% of on/off-heap limit)
> Jul 30 15:56:02 x cassandra: INFO  14:56:02 Cannot perform a full major 
> compaction as repaired and unrepaired sstables cannot be compacted together. 
> These two set of sstables will be compacted separately.
> Jul 30 15:56:02 x cassandra: INFO  14:56:02 Writing 
> Memtable-compactions_in_progress@1455399453(0.281KiB serialized bytes, 16 
> ops, 0%/0% of on/off-heap limit)
> Jul 30 15:56:04 x tag_audit_log: type=USER_CMD 
> msg=audit(1564498555.190:457951): pid=19294 uid=509 auid=4294967295 
> ses=4294967295 msg='cwd="/" 
> cmd=72756E75736572202D73202F62696E2F62617368202D6C20636173735F62202D632063617373616E6472612D6D6574612F63617373616E6472612F62696E2F6E6F6465746F6F6C2074707374617473
>  terminal=? res=success'
> We have checked a number of other things like NTP setting etc but nothing is 
> telling us what could cause so many corruptions across the entire cluster.
> Things were healthy with this cluster for months, the only thing I can think 
> is that we started loading data from a load of 20GB per instance up to 200GB 
> where it sits now, maybe this just highlighted the issue.
> Compaction and Compression on Keyspace CL's [mixture]
> All CF's are using compression.
> AND compaction = {'min_threshold': '4', 'class': 
> 'org.apache.cassandra.db.compaction.*SizeTieredCompactionStrategy*', 
> 'max_threshold': '32'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.*SnappyCompressor*'}
> AND compaction = {'min_threshold': '4', 'class': 
> 'org.apache.cassandra.db.compaction.*SizeTieredCompactionStrategy*', 
> 'max_threshold': '32'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.*LZ4Compressor*'}
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.*LeveledCompactionStrategy*'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.*SnappyCompressor*'}
> --We are also using internode network compression:
> internode_compression: all
>  
>  
> Please tell me whatever you need to help me debug this issue. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to