Hyoungjun Kim created TAJO-1540:
-----------------------------------
Summary: RpcCallback must be able to handle TimeoutException or
cancel.
Key: TAJO-1540
URL: https://issues.apache.org/jira/browse/TAJO-1540
Project: Tajo
Issue Type: Bug
Reporter: Hyoungjun Kim
I investigated the lock of CallFuture while reviewing TAJO-1469. CallFuture
should be synchronized with run() and get(). Current code looks like this would
be implemented but not. If the following situation is occur, some resources or
tasks will be lost forever.
Worker: TaskRunner sends GetTask request.
QM: QueryMaster selects proper task and calls RpcCallback.
Worker: AsyncRpcClient receives the response and calls
CallFuture.run(response). 3-1. Worker: If TimeoutException occurs after 1)
between 2) ~ 3), TaskRunner can't receive the response and doesn't run the
allocated task, but QM doesn't know about that.
We should fix this problem in the RPC module and add a right cancel logic.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)