[
https://issues.apache.org/jira/browse/TAJO-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651656#comment-13651656
]
Hyunsik Choi commented on TAJO-54:
----------------------------------
Can anyone review this patch? This patch is very critical issue to avoid the
hanging problem.
> SubQuery::allocateContainers() may ask 0 containers
> ---------------------------------------------------
>
> Key: TAJO-54
> URL: https://issues.apache.org/jira/browse/TAJO-54
> Project: Tajo
> Issue Type: Bug
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> Priority: Critical
> Labels: yarn
> Fix For: 0.2-incubating
>
> Attachments: TAJO-54_2.patch, TAJO-54.patch
>
>
> SubQuery::allocateContainers() calculates a number of containers to be
> requested for some subquery and then requests containers as follows:
> {code:title=SubQuery.java}
> public static void allocateContainers(SubQuery subQuery) {
> ExecutionBlock execBlock = subQuery.getBlock();
> QueryUnit [] tasks = subQuery.getQueryUnits();
> int numRequest = Math.min(tasks.length,
> subQuery.context.getNumClusterNode() * 4);
> {code}
> In allocateContainers subQuery.context.getNumClusterNode() method internally
> invokes AMRMClient::getClusterNodeCount(). allocateContainers() requests 0
> container to RM if AMRMClient::getClusterNodeCount() returns 0. If it does
> so, AppSchedulingInfo regards ApplicationMaster as deactive. As a result,
> ApplicationMaster cannot acquire any containers.
> In the current Hadoop Yarn, AMRMClient::getClusterNodeCount() temporarily
> returns 0 due to unknown reason even though there are available cluster
> nodes. This problem causes the integration test (i.e., 'mvn verify') to be
> hanging. This patch solves this problem by enabling RMContainerAllocator to
> wait for available cluster nodes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira