[jira] [Created] (NIFI-5036) Nifi Shouldn't Rely on ForkJoinPool.commonPool()

Oleksi Derkatch (JIRA) Mon, 02 Apr 2018 20:37:33 -0700

Oleksi Derkatch created NIFI-5036:
-------------------------------------

             Summary: Nifi Shouldn't Rely on ForkJoinPool.commonPool()
                 Key: NIFI-5036
                 URL: https://issues.apache.org/jira/browse/NIFI-5036
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
    Affects Versions: 1.5.0
            Reporter: Oleksi Derkatch



A few weeks ago we encountered a problem with one of our custom processors 
which seemed to deadlock all processing in Nifi under load. We believe the 
issue is that our processors were relying on ForkJoinPool.commonPool, but so 
was the Nifi engine during it's scheduling (both via CompletableFuture). As 
such, when we did a thread dump, we saw something like this:
 
{{"ForkJoinPool.commonPool-worker-6" #381 daemon prio=5 os_prio=0 
tid=0x00007f300d934000 nid=0x4be4 waiting on condition [0x00007f2fd53e7000]}}{{ 
  java.lang.Thread.State: WAITING (parking)}}{{    at 
sun.misc.Unsafe.park(Native Method)}}{{    - parking to wait for  
<0x00000006c8b00568> (a java.util.concurrent.CompletableFuture$Signaller)}}{{   
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)}}{{    at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)}}{{
    at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3313)}}{{    
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)}}{{
    at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)}}{{    
at customcode(customcode.java:83)}}{{    at 
customcode.lambda$null$23(customcode.java:320)}}{{    at 
customcode$$Lambda$261/442205945.call(Unknown Source)}}{{    at 
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)}}{{
    at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)}}{{
    at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)}}{{   
 at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)}}{{
    - locked <0x00000006c8b007f0> (a 
com.google.common.cache.LocalCache$StrongWriteEntry)}}{{    at 
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)}}{{    at 
com.google.common.cache.LocalCache.get(LocalCache.java:3932)}}{{    at 
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)}}{{
    at customcode.lambda$customethod$24(customcode.java:309)}}{{    at 
customcode$$Lambda$258/1540137328.get(Unknown Source)}}{{    at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)}}{{
    at 
java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)}}{{
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)}}{{    
at 
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)}}{{ 
   at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)}}{{   
 at 
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)}}

{color:#000000} {color}
 
I think what happened here is that since we were both using 
ForkJoinPool.commonPool() like this, we actually ran out of threads and 
deadlocked. We were waiting (in the nifi processor) on a future that was also 
submitted to the same commonPool at the time of the deadlock. The solution was 
for us to use a dedicated thread pool instead of a shared one.
 
It might be worth considering changing this in Nifi for the future, in case 
other custom processors use this pattern as well.
 
This also brings up another question. By default, the size of this thread pool 
is (# of CPUs - 1). How does that affect processing when we set the maximum 
number of threads in the Nifi UI to be much higher than that? Shouldn't this 
thread pool be configured for the same size? This is tunable with the 
-Djava.util.concurrent.ForkJoinPool.common.parallelism java flag (which we also 
adjusted in troubleshooting this).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (NIFI-5036) Nifi Shouldn't Rely on ForkJoinPool.commonPool()

Reply via email to