[
https://issues.apache.org/jira/browse/TEZ-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106151#comment-14106151
]
Siddharth Seth commented on TEZ-1363:
-------------------------------------
In TezLocalAMRMClientAsync
- containerRequests - a LinkedMap / map would likely work better (if a local
job ever has lots of tasks)
- LocalContainerAllocator should be a daemon. Also the unnecessary fields from
the constructor could be dropped.
- RegisterApplicationMasterResponse should contain min / max capability since
the TaskScheduler may have some error checking w.r.t this. Not too sure about
ACLs being null.
- getTopPriority - would be more efficient if Priority is maintained
independently instead of scanning the entire list.
- getMatchingRequests - Does capability / resourceName need to be considered ?
- getMatchingRequestsForTopPriority - should be using the top priority to
return a match.
A lot of this logic already exists in TezAMRMClientAsync, in conjunction with
AMRMClientAsync. Is there a way to re-use the logic ? (I doubt AMRMClientAsycn
allows overriding methods which make the final request out to YARN after
updating it's local structures).
In LocalContainerAllocator
{code}
+ void handleRemoveContainerRequest(RemoveContainerRequest request) {
+ LOG.info("Processing remove container request: " + request.req);
+ // Otherwise the allocation will happen.
+ if (!containerRequestQueue.remove(request.req)) {
+ LOG.warn("Unable to find and remove container request: " + request.req);
+ }
+ }
{code}
Don't think this will work. request.req is a CookieContainerRequest - the
corresponding AddContainerRequest ("ContainerRequest") in the queue will have
to be removed. A map based removal would be better - instead of having to scan
through the queue. Alternately, could just retain a list of cancelled ones and
match against the launch list.
{code}
+ NodeId nodeId = NodeId.newInstance("127.0.0.1", 0);
+ String nodeHttpAddress = "127.0.0.1:0";
{code}
Should be picked up from the environment, to stay consistent.
LocalContainerLauncher would have to inform the LocalContainerAllocator about
NM_STOP_REQUESTS which have been sent out - otherwise I don't think
LocalContainerAllocator would find out about completed containers. Similarly,
will have to look at what happens in case of failures. I'm not sure
S_CONTAINER_DEALLOCATE message from AMContainer actually makes it over to
AMRMClient (via the TaskScheduler).
Merging the LocalContainerLauncher / LocalContainerAllocator may help with this
- or at least passing messages between them.
> Make use of the regular scheduler when running in LocalMode
> -----------------------------------------------------------
>
> Key: TEZ-1363
> URL: https://issues.apache.org/jira/browse/TEZ-1363
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Jonathan Eagles
> Attachments: TEZ-1363-v1.patch, TEZ-1363-v2.patch, TEZ-1363-v3.patch
>
>
> In TEZ-708, we decided to introduce a new scheduler for local mode - to keep
> things simple initially, and get local mode working.
> Eventually, however, scheduling should go through the regular task scheduler
> - which should be able to get containers from YARN / LocalAllocator / other
> sources - and treat them as a regular container for scheduling purposes.
--
This message was sent by Atlassian JIRA
(v6.2#6252)