[ https://issues.apache.org/jira/browse/HADOOP-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642881#action_12642881 ]
Vivek Ratan commented on HADOOP-4513: ------------------------------------- Yes, we need to make sure jobs are initialized asynchronously (so that initTasks() is not called synchronously from within a heartbeat) and as early as possible (so that a job is already initialized when we consider it to run). We also want to have just a few number of waiting jobs initialized at any given time so that their memory footprint is low. I suggest we use an enhanced version of EagerTaskInitializationListener, so that jobs are initialized asynchronously in a separate thread. The difference being, we use some of the limits described in HADOOP-4428. We can have a limit on the total number of waiting jobs initialized (maybe 10 per queue), as well a limit on initialized jobs/user/queue (maybe 3/per/queue). The modified EagerTaskInitializationListener thread enforces these limits and only initializes jobs as necessary. > Capacity scheduler should initialize tasks asynchronously > --------------------------------------------------------- > > Key: HADOOP-4513 > URL: https://issues.apache.org/jira/browse/HADOOP-4513 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/capacity-sched > Affects Versions: 0.19.0 > Reporter: Hemanth Yamijala > Assignee: Sreekanth Ramakrishnan > > Currently, the capacity scheduler initializes tasks on demand, as opposed to > the eager initialization technique used by the default scheduler. This is > done in order to save JT memory footprint. However, the initialization is > done in the {{assignTasks}} API which is not a good idea as task > initialization could be a time consuming operation. This JIRA is to move out > the initialization outside the {{assignTasks}} API and do it asynchronously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.