Young-Seok Kim created ASTERIXDB-1144:
-----------------------------------------

             Summary: FeedMetaStoreNodePushable.close() call hangs
                 Key: ASTERIXDB-1144
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1144
             Project: Apache AsterixDB
          Issue Type: Bug
            Reporter: Young-Seok Kim
            Assignee: Abdullah Alamoudi
            Priority: Critical


Feed job hangs in FeedMetaStoreNodePushable.close() call as shown in the 
following jstack trace:
"org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:1:0:7:0:0"
 daemon prio=10 tid=0x00007fac6005c000 nid=0x4310 in Object.wait() 
[0x00007facd74f3000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:503)
        at 
org.apache.asterix.metadata.feeds.FeedMetaStoreNodePushable.close(FeedMetaStoreNodePushable.java:195)
        - locked <0x0000000677fd9e80> (a 
org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable)
        at 
org.apache.hyracks.algebricks.runtime.operators.std.StreamProjectRuntimeFactory$1.close(StreamProjectRuntimeFactory.java:140)
        at 
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.close(AssignRuntimeFactory.java:220)
        at 
org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.close(AlgebricksMetaOperatorDescriptor.java:145)
        at 
org.apache.asterix.metadata.feeds.FeedMetaNodePushable.close(FeedMetaNodePushable.java:174)
        at 
org.apache.hyracks.storage.am.common.dataflow.IndexInsertUpdateDeleteOperatorNodePushable.close(IndexInsertUpdateDeleteOperatorNodePushable.java:153)
        at 
org.apache.asterix.metadata.feeds.FeedMetaStoreNodePushable.close(FeedMetaStoreNodePushable.java:200)
        at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:349)
        at org.apache.hyracks.control.nc.Task.run(Task.java:290)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

The reason of the hang seems to have a bug in the wait/notify code. 

More specifically,  FrameEventCallback.frameEvent() method gives notification 
in the following code snippet:
------------------------------------------------------
            case FINISHED_PROCESSING:
                inputSideHandler.setFinished(true);
                synchronized (coreOperator) {
                    coreOperator.notifyAll();
                }
------------------------------------------------------
FeedMetaStoreNodePushable.close() methods waits notification in the following 
code snippet:
------------------------------------------------------
                while (!inputSideHandler.isFinished()) {
                    synchronized (coreOperator) {
                        coreOperator.wait();
                    }
                }
------------------------------------------------------
If a caller thread of the close() just called isFinished(), it's return value 
is false, then the thread is scheduled by OS and waits for the next scheduling 
for running.
Then, if the a caller thread of the frameEvent() called setFinished(true) and 
coreOperator.notifyAll(), then the notification of notifyAll() can be lost. In 
other words, the notification may not reach to the caller thread of the 
close(). 
If this happens, the caller may hang as shown in the above jstack trace. 







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to