[jira] Commented: (HADOOP-4513) Capacity scheduler should initialize tasks asynchronously

Vivek Ratan (JIRA) Mon, 27 Oct 2008 02:35:20 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642881#action_12642881
 ]


Vivek Ratan commented on HADOOP-4513:
-------------------------------------

Yes, we need to make sure jobs are initialized asynchronously (so that 
initTasks() is not called synchronously  from within a heartbeat) and as early 
as possible (so that a job is already initialized when we consider it to run). 
We also want to have just a few number of waiting jobs initialized at any given 
time so that their memory footprint is low. I suggest we use an enhanced 
version of EagerTaskInitializationListener, so that jobs are initialized 
asynchronously in a separate thread. The difference being, we use some of the 
limits described in HADOOP-4428. We can have a limit on the total number of 
waiting jobs initialized (maybe 10 per queue), as well a limit on initialized 
jobs/user/queue (maybe 3/per/queue). The modified 
EagerTaskInitializationListener thread enforces these limits and only 
initializes jobs as necessary. 

> Capacity scheduler should initialize tasks asynchronously
> ---------------------------------------------------------
>
>                 Key: HADOOP-4513
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4513
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>    Affects Versions: 0.19.0
>            Reporter: Hemanth Yamijala
>            Assignee: Sreekanth Ramakrishnan
>
> Currently, the capacity scheduler initializes tasks on demand, as opposed to 
> the eager initialization technique used by the default scheduler. This is 
> done in order to save JT memory footprint. However, the initialization is 
> done in the {{assignTasks}} API which is not a good idea as task 
> initialization could be a time consuming operation. This JIRA is to move out 
> the initialization outside the {{assignTasks}} API and do it asynchronously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4513) Capacity scheduler should initialize tasks asynchronously

Reply via email to