Github user babokim commented on the pull request:
https://github.com/apache/tajo/pull/480#issuecomment-90046382
I investigated the lock of CallFuture. CallFuture should be synchronized
with run() and get(). Current code looks like this would be implemented but
not. If the following situation is occur, some resources or tasks will be lost
forever.
1. Worker: TaskRunner sends GetTask request.
2. QM: QueryMaster selects proper task and calls RpcCallback.
3. Worker: AsyncRpcClient receives the response and calls
CallFuture.run(response).
3-1. Worker: If TimeoutException occurs after 1) between 2) ~ 3),
TaskRunner can't receive the response and doesn't run the allocated task, but
QM doesn't know about that.
If my thought is wrong, please let me know.
If my thought is right, this patch is temporary solution and we need to
create another issue for this problem.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---