[ 
https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906018#comment-14906018
 ] 

Jędrzej Sieracki commented on CASSANDRA-10389:
----------------------------------------------

After checking the logs more thoroughly, the issue seems to be "Cannot start 
multiple repair sessions over the same sstables".

The interesting log portions from repair session run on cblade1:

{quote}
INFO  [Repair#24:1] 2015-09-24 09:58:37,480 RepairJob.java:107 - [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8] requesting merkle trees for 
stock_increment_agg (to [/cblade10, cblade1])
INFO  [Repair#24:1] 2015-09-24 09:58:37,480 RepairJob.java:181 - [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8] Requesting merkle trees for 
stock_increment_agg (to [/cblade10, cblade1])
ERROR [ValidationExecutor:28] 2015-09-24 09:58:37,481 
CompactionManager.java:1070 - Cannot start multiple repair sessions over the 
same sstables
ERROR [ValidationExecutor:28] 2015-09-24 09:58:37,481 Validator.java:246 - 
Failed creating a merkle tree for [repair #0fc98340-6292-11e5-b992-9f13fa8664c8 
on perspectiv/stock_increment_agg, 
(-5927186132136652665,-5917344746039874798]], /cblade1(see log for details)
INFO  [AntiEntropyStage:1] 2015-09-24 09:58:37,481 RepairSession.java:181 - 
[repair #0fc98340-6292-11e5-b992-9f13fa8664c8] Received merkle tree for 
stock_increment_agg from /cblade1
ERROR [ValidationExecutor:28] 2015-09-24 09:58:37,481 CassandraDaemon.java:183 
- Exception in thread Thread[ValidationExecutor:28,1,main]
java.lang.RuntimeException: Cannot start multiple repair sessions over the same 
sstables
        at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1071)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:94)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:669)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
WARN  [RepairJobTask:1] 2015-09-24 09:58:37,481 RepairJob.java:162 - [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8] stock_increment_agg sync failed
ERROR [RepairJobTask:2] 2015-09-24 09:58:37,482 CassandraDaemon.java:183 - 
Exception in thread Thread[RepairJobTask:2,5,RMI Runtime]
org.apache.cassandra.exceptions.RepairException: [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, 
(-5927186132136652665,-5917344746039874798]] Validation failed in 
cblade1.dforcom.localdomain/cblade1
        at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:399)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:158)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
INFO  [Repair#24:2] 2015-09-24 09:58:37,482 RepairJob.java:107 - [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8] requesting merkle trees for 
receipt_agg_total (to [/cblade10, cblade1.dforcom.localdomain/cblade1])
ERROR [Repair#24:1] 2015-09-24 09:58:37,482 CassandraDaemon.java:183 - 
Exception in thread Thread[Repair#24:1,5,RMI Runtime]
com.google.common.util.concurrent.UncheckedExecutionException: 
org.apache.cassandra.exceptions.RepairException: [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, 
(-5927186132136652665,-5917344746039874798]] Validation failed in 
cblade1.dforcom.localdomain/cblade1
        at 
com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1387)
 ~[guava-16.0.jar:na]
        at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1373) 
~[guava-16.0.jar:na]
        at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:169) 
~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
Caused by: org.apache.cassandra.exceptions.RepairException: [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, 
(-5927186132136652665,-5917344746039874798]] Validation failed in 
cblade1.dforcom.localdomain/cblade1
        at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:399)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:158)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
~[apache-cassandra-2.2.1.jar:2.2.1]
        ... 3 common frames omitted
INFO  [Repair#24:2] 2015-09-24 09:58:37,482 RepairJob.java:181 - [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8] Requesting merkle trees for 
receipt_agg_total (to [/cblade10, cblade1.dforcom.localdomain/cblade1])
INFO  [AntiEntropyStage:1] 2015-09-24 09:58:37,482 RepairSession.java:181 - 
[repair #0fc98340-6292-11e5-b992-9f13fa8664c8] Received merkle tree for 
stock_increment_agg from /cblade10
ERROR [RepairJobTask:1] 2015-09-24 09:58:37,482 RepairSession.java:290 - 
[repair #0fc98340-6292-11e5-b992-9f13fa8664c8] Session completed with the 
following error
org.apache.cassandra.exceptions.RepairException: [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, 
(-5927186132136652665,-5917344746039874798]] Validation failed in 
cblade1.dforcom.localdomain/cblade1
        at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:399)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:158)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
{quote}

..and at that moment on cblade10:

{quote}
ERROR [ValidationExecutor:21] 2015-09-24 09:58:37,481 
CompactionManager.java:1070 - Cannot start multiple repair sessions over the 
same sstables
ERROR [ValidationExecutor:21] 2015-09-24 09:58:37,481 Validator.java:246 - 
Failed creating a merkle tree for [repair #0fc98340-6292-11e5-b992-9f13fa8664c8 
on perspectiv/stock_increment_agg, 
(-5927186132136652665,-5917344746039874798]], /cblade1 (see log for details)
ERROR [ValidationExecutor:21] 2015-09-24 09:58:37,482 CassandraDaemon.java:183 
- Exception in thread Thread[ValidationExecutor:21,1,main]
java.lang.RuntimeException: Cannot start multiple repair sessions over the same 
sstables
        at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1071)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:94)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:669)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
ERROR [ValidationExecutor:21] 2015-09-24 09:58:37,483 
CompactionManager.java:1070 - Cannot start multiple repair sessions over the 
same sstables
ERROR [ValidationExecutor:21] 2015-09-24 09:58:37,483 Validator.java:246 - 
Failed creating a merkle tree for [repair #0fc98340-6292-11e5-b992-9f13fa8664c8 
on perspectiv/receipt_agg_total, (-5927186132136652665,-5917344746039874798]], 
/cblade1 (see log for details)
ERROR [ValidationExecutor:21] 2015-09-24 09:58:37,483 CassandraDaemon.java:183 
- Exception in thread Thread[ValidationExecutor:21,1,main]
java.lang.RuntimeException: Cannot start multiple repair sessions over the same 
sstables
        at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1071)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:94)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at 
org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:669)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
{quote}

> Repair session exception Validation failed
> ------------------------------------------
>
>                 Key: CASSANDRA-10389
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10389
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax 
> compilation)
>            Reporter: Jędrzej Sieracki
>
> I'm running a repair on a ring of nodes, that was recently extented from 3 to 
> 13 nodes. The extension was done two days ago, the repair was attempted 
> yesterday.
> {quote}
> [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace 
> perspectiv with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 517)
> [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 
> for range (-5927186132136652665,-5917344746039874798] failed with error 
> [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on 
> perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] 
> Validation failed in cblade1.XXX/XXX (progress: 0%)
> {quote}
> BTW, I am ignoring the LEAK errors for now, that's outside of the scope of 
> the main issue:
> {quote}
> ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big
>  was not released before the reference was garbage collected
> {quote}
> I scrubbed the sstable with failed validation on cblade1 with nodetool scrub 
> perspectiv stock_increment_agg:
> {quote}
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  (345466609 bytes)
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  (60496378 bytes)
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big
>  was not released before the reference was garbage collected
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped
> {quote}
> Now, after scrubbing, another repair was attempted, it did finish, but with 
> lots of errors from other nodes:
> {quote}
> [2015-09-22 12:01:18,020] Repair session db476b51-6110-11e5-b992-9f13fa8664c8 
> for range (5019296454787813261,5021512586040808168] failed with error [repair 
> #db476b51-6110-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, 
> (5019296454787813261,5021512586040808168]] Validation failed in /10.YYY 
> (progress: 91%)
> [2015-09-22 12:01:18,079] Repair session db482ea1-6110-11e5-b992-9f13fa8664c8 
> for range (-3660233266780784242,-3638577078894365342] failed with error 
> [repair #db482ea1-6110-11e5-b992-9f13fa8664c8 on 
> perspectiv/stock_increment_agg, (-3660233266780784242,-3638577078894365342]] 
> Validation failed in /10.XXX (progress: 92%)
> [2015-09-22 12:01:18,276] Repair session db4a0361-6110-11e5-b992-9f13fa8664c8 
> for range (9158857758535272856,9167427882441871745] failed with error [repair 
> #db4a0361-6110-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, 
> (9158857758535272856,9167427882441871745]] Validation failed in /10.YYY 
> (progress: 95%)
> {quote}
> After scrubbing stock_increment_agg on all nodes, just to be sure, the repair 
> still failed, this time with the following exception:
> {quote}
> INFO  [Repair#16:50] 2015-09-22 12:08:47,471 RepairJob.java:181 - [repair 
> #ea123bf3-6111-11e5-b992-9f13fa8664c8] Requesting merkle trees for 
> stock_increment_agg (to [/10.60.77.202, cblade1.XXX/XXX])
> ERROR [RepairJobTask:1] 2015-09-22 12:08:47,471 RepairSession.java:290 - 
> [repair #ea123bf0-6111-11e5-b992-9f13fa8664c8] Session completed with the 
> following error
> org.apache.cassandra.exceptions.RepairException: [repair 
> #ea123bf0-6111-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, 
> (355657753119264326,366309649129068298]] Validation failed in cblade1.
>         at 
> org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
>         at 
> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
>         at 
> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:399)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
>         at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:158)
>  ~[apache-cassandra-2.2.1.jar:2.2.1]
>         at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[apache-cassandra-2.2.1.jar:2.2.1]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to