[ 
https://issues.apache.org/jira/browse/HADOOP-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681270#action_12681270
 ] 

Devaraj Das commented on HADOOP-4664:
-------------------------------------

Some comments:
Set the daemon attribute for the init threads.
The termination of the main init thread should be fixed. The "while(true)" 
should be checking for the interrupt status.
It would be better to use the static method from the Executors factory - 
Executors.newFixedThreadPool(int) instead of constructing a new thread pool 
using the explicit constructor.
Don't have to catch Exception in JobInitManager.run

> Parallelize job initialization
> ------------------------------
>
>                 Key: HADOOP-4664
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4664
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Matei Zaharia
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.20.0
>
>         Attachments: hadoop-4664-v1.patch, hadoop-4664-v2.patch, 
> parallel-job-init-v1.patch
>
>
> The job init thread currently initializes one job at a time. However, this is 
> a lengthy and partly IO-bound process because all of the job's block 
> locations need to be resolved through the namenode and a map of them needs to 
> be built. It can take tens of seconds. As a result, the cluster sometimes 
> initializes jobs too slowly for full utilization to be achieved, if there are 
> many small jobs queued up. It would be better to have a pool of threads that 
> initialize multiple jobs in parallel. One thing to be careful of, however, is 
> not causing deadlocks or holding locks for too long in these threads.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to