[
https://issues.apache.org/jira/browse/TAJO-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481179#comment-14481179
]
ASF GitHub Bot commented on TAJO-1469:
--------------------------------------
Github user babokim commented on the pull request:
https://github.com/apache/tajo/pull/480#issuecomment-90046382
I investigated the lock of CallFuture. CallFuture should be synchronized
with run() and get(). Current code looks like this would be implemented but
not. If the following situation is occur, some resources or tasks will be lost
forever.
1. Worker: TaskRunner sends GetTask request.
2. QM: QueryMaster selects proper task and calls RpcCallback.
3. Worker: AsyncRpcClient receives the response and calls
CallFuture.run(response).
3-1. Worker: If TimeoutException occurs after 1) between 2) ~ 3),
TaskRunner can't receive the response and doesn't run the allocated task, but
QM doesn't know about that.
If my thought is wrong, please let me know.
If my thought is right, this patch is temporary solution and we need to
create another issue for this problem.
> allocateQueryMaster can leak resources if it times-out (3sec, hardcoded)
> ------------------------------------------------------------------------
>
> Key: TAJO-1469
> URL: https://issues.apache.org/jira/browse/TAJO-1469
> Project: Tajo
> Issue Type: Bug
> Reporter: Navis
> Assignee: Navis
>
> {code}
> WorkerResourceAllocationResponse response = null;
> try {
> response = callFuture.get(3, TimeUnit.SECONDS);
> } catch (Throwable t) {
> LOG.error(t, t);
> return null;
> }
> {code}
> If it times-out (or interrupted), allocated resources cannot be retrieved
> forever.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)