[ 
https://issues.apache.org/jira/browse/CASSANDRA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514073#comment-15514073
 ] 

Benjamin Roth edited comment on CASSANDRA-12689 at 9/22/16 6:23 PM:
--------------------------------------------------------------------

Same situation, different node, different trace:

Name: MutationStage-37
State: WAITING on java.util.concurrent.CompletableFuture$Signaller@1e1dc9eb
Total blocked: 58  Total waited: 709.137

Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
org.apache.cassandra.db.Mutation.apply(Mutation.java:241)
org.apache.cassandra.service.StorageProxy$$Lambda$249/1210907398.run(Unknown 
Source)
org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1410)
org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2628)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
java.lang.Thread.run(Thread.java:745)

Load Graph of affected node: https://cl.ly/201c3T3f0M0s
Mutations: https://cl.ly/0T2R3b0y2435


was (Author: brstgt):
Same situation, different node, different trace:

Name: MutationStage-37
State: WAITING on java.util.concurrent.CompletableFuture$Signaller@1e1dc9eb
Total blocked: 58  Total waited: 709.137

Stack trace: 
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
org.apache.cassandra.db.Mutation.apply(Mutation.java:241)
org.apache.cassandra.service.StorageProxy$$Lambda$249/1210907398.run(Unknown 
Source)
org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1410)
org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2628)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
java.lang.Thread.run(Thread.java:745)

> All MutationStage threads blocked, kills server
> -----------------------------------------------
>
>                 Key: CASSANDRA-12689
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12689
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Benjamin Roth
>            Priority: Critical
>
> Under heavy load (e.g. due to repair during normal operations), a lot of 
> NullPointerExceptions occur in MutationStage. Unfortunately, the log is not 
> very chatty, trace is missing:
> 2016-09-22T06:29:47+00:00 cas6 [MutationStage-1] 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService Uncaught 
> exception on thread Thread[MutationStage-1,5,main]: {}
> 2016-09-22T06:29:47+00:00 cas6 #011java.lang.NullPointerException: null
> Then, after some time, in most cases ALL threads in MutationStage pools are 
> completely blocked. This leads to piling up pending tasks until server runs 
> OOM and is completely unresponsive due to GC. Threads will NEVER unblock 
> until server restart. Even if load goes completely down, all hints are 
> paused, and no compaction or repair is running. Only restart helps.
> I can understand that pending tasks in MutationStage may pile up under heavy 
> load, but tasks should be processed and dequeud after load goes down. This is 
> definitively not the case. This looks more like a an unhandled exception 
> leading to a stuck lock.
> Stack trace from jconsole, all Threads in MutationStage show same trace.
> Name: MutationStage-48
> State: WAITING on java.util.concurrent.CompletableFuture$Signaller@fcc8266
> Total blocked: 137  Total waited: 138.513
> Stack trace: 
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:137)
> org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
> org.apache.cassandra.db.Mutation.apply(Mutation.java:241)
> org.apache.cassandra.hints.Hint.apply(Hint.java:96)
> org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:91)
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109)
> java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to