[jira] [Created] (NIFI-1333) FlowController fails to shut down gracefully even though there is nothing going on in the flow

Oleg Zhurakousky (JIRA) Wed, 23 Dec 2015 12:34:10 -0800

Oleg Zhurakousky created NIFI-1333:
--------------------------------------

             Summary: FlowController fails to shut down gracefully even though 
there is nothing going on in the flow
                 Key: NIFI-1333
                 URL: https://issues.apache.org/jira/browse/NIFI-1333
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 0.4.1
            Reporter: Oleg Zhurakousky
            Assignee: Oleg Zhurakousky
            Priority: Trivial
             Fix For: 0.5.0



Basically the following test fails: 
https://github.com/olegz/nifi/blob/int-test/nifi-integration-tests/src/test/java/org/apache/nifi/test/flowcontroll/FlowControllerTests.java#L50
 even though there is no compelling reason for it to fail based on what's in 
the flow.
Also, the message in logs is confusing . . .
{code}
Initiated graceful shutdown of flow controller...waiting up to 10 seconds
2015-12-23 15:19:11,977 WARN [main] o.apache.nifi.controller.FlowController 
Controller hasn't terminated properly.  There exists an uninterruptable thread 
that will take an indeterminate amount of time to stop.  Might need to kill the 
program manually.
{code}
What actually happens is deadlock during the shutdown.
Below are the relevant jstack:
{code}
java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007aeb20988> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
        at 
java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1468)
        at 
org.apache.nifi.controller.FlowController.shutdown(FlowController.java:1124)
        at org.apache.nifi.test.s2s.SiteToSiteTests.bar(SiteToSiteTests.java:75)
. . .
"Framework Task Thread Thread-1" prio=5 tid=0x00007fc8a2064800 nid=0x6a03 
waiting on condition [0x0000700001ded000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007aeb20288> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
        at 
org.apache.nifi.controller.FlowController.getRootGroupId(FlowController.java:1262)
        at 
org.apache.nifi.controller.tasks.ExpireFlowFiles.run(ExpireFlowFiles.java:54)
. . .

"Timer-Driven Process Thread-1" prio=5 tid=0x00007fc8a3146800 nid=0x6c03 
waiting on condition [0x0000700001ef0000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007aeb20288> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
        at 
org.apache.nifi.controller.FlowController.isClustered(FlowController.java:2984)
        at 
org.apache.nifi.controller.FlowController.heartbeat(FlowController.java:3444)
{code}
The issue the way I see it is that FlowController's _shutdown_ routine is 
synchronized under the same lock as most of the FlowController callbacks made 
by other threads, hence those threads can't be shutdown since they are in 
dead-lock.

I don't think there is any reason to synchronize the the shutdown routine since 
all we are trying to do is shut down the very same threads that are blocking. 
Removing synchronization resolves the issue.

Will submit a patch in a few




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (NIFI-1333) FlowController fails to shut down gracefully even though there is nothing going on in the flow

Reply via email to