yuanyimeng created OOZIE-3612:
---------------------------------

             Summary: CompositeCallable became empty after requeue and cause 
IndexOutOfBoundException in getType() method, which cause counters for this 
type in  activeCallables never be descreased and exceed concurrency 
                 Key: OOZIE-3612
                 URL: https://issues.apache.org/jira/browse/OOZIE-3612
             Project: Oozie
          Issue Type: Bug
          Components: action, workflow
    Affects Versions: 4.2.0
         Environment: We use oozie 4.2.0. but in the newest release this 
problems seems also exists
            Reporter: yuanyimeng
         Attachments: callableRun.png, 
counternotdescreaedifgettypeexceptionhappen.png, 
eception_in_composite_callable.png, exceptionlog.png

This should rarely happens but it just happened in out environment twice. The 
process is listed below.
 # we have long hang actions which last serveral hours (This is and error 
sitution, task execute quickly in normal time). the action type is developed by 
ourself by extend the Executor.
 # These actions's ActionCheckXCommand is put into the callable queue by 
composited into an array with max ten element
 # Before really put into the thread pool for execute, they will be filtered if 
element is already existed ,  the existence is identified by the 
uniqueCallables. So the real CompositeCallable in queue may actually have 
element less then 10.
 # When this CompositeCallable is poll form queue. Before execute it will check 
the concurreny for the action_check type. if the concurrency is reached, it 
will be requeued. 
 # In the requeue procedure ,  the ActionCheckService happens to already put 
these ActionCheckXCommand in queue, so the CompositeCallable is filtered with 0 
element.
 # In the finally procedure, the CallableEnd method will need to descrease the 
counter of this type . But IndexOutOfBounds happens when called on the empty 
CompositeCallable's getType method, which cause the counter will never be 
descreased
 # If it happens for the maxComcurrency time, the counter itself will exceed 
the maxConcurrency and this type can never be taked outside of the queue,  they 
will lived in queue forever, which cause the workflow hang. 

 

The pic  where the exception happened is attached. Hopes it describe it 
clearly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to