Hyoungjun Kim created TAJO-1540:
-----------------------------------

             Summary: RpcCallback must be able to handle TimeoutException or 
cancel.
                 Key: TAJO-1540
                 URL: https://issues.apache.org/jira/browse/TAJO-1540
             Project: Tajo
          Issue Type: Bug
            Reporter: Hyoungjun Kim


I investigated the lock of CallFuture while reviewing TAJO-1469. CallFuture 
should be synchronized with run() and get(). Current code looks like this would 
be implemented but not. If the following situation is occur, some resources or 
tasks will be lost forever.

Worker: TaskRunner sends GetTask request.
QM: QueryMaster selects proper task and calls RpcCallback.
Worker: AsyncRpcClient receives the response and calls 
CallFuture.run(response). 3-1. Worker: If TimeoutException occurs after 1) 
between 2) ~ 3), TaskRunner can't receive the response and doesn't run the 
allocated task, but QM doesn't know about that.

We should fix this problem in the RPC module and add a right cancel logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to