[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459713#comment-13459713
 ] 

Kihwal Lee commented on MAPREDUCE-4662:
---------------------------------------

bq. One solution is to specify maximum number of queued requests for 
LinkedBlockingQueue.

That could be it, but this solution needs more changes. When the queue is full 
and the max number of threads are running, new task will be rejected. We could 
apply CallerRunsPolicy, but the whole point of having ThreadPoolExecutor is to 
avoid blocking of JobTracker for doing job completion.

I think the main requirements here are:
* Absorb bursty job completions - queueing with sufficient capacity or fast 
dispatching with a large thread pool.
* Avoid limiting job throughput - enough number of worker threads
* Minimize consumption of extra resource - limit the number of worker threads
* Don't drop anything.

To satisfy the first and second requirements, one can think of the following 
two approaches.

* Have a bounded queue and a sufficiently large thread pool. Since we cannot 
drop any job completion, we want CallerRunsPolicy for rejected ones. 

* Alternatively, use an unbounded queue and a reasonable number of core 
threads. No work will be rejected in this case.

Between the two, the second one has an advantage, considering the third 
requirement and its simplicity. The question is, what is the reasonable number 
of core threads to avoid lagging behind forever? Base on our experience, 3 to 5 
seems reasonable.  The moveToDone() throughput varies a lot, but it topped at 
around 0.8/second in one of busiest clusters I've seen. If the job completion 
rate goes over this rate for a long time, the queue will grow and history won't 
show up for most of newer jobs.

Here are the two approaches in code:

* The queue is bounded but will absorb bursts of about 100. If the core thread 
cannot keep up, up to 10 more threads will be created to help the core thread 
drain the queue.  If the queue cannot be drained fast enough, the caller will 
directly execute the work. This will block the job tracker, since 
JobTracker#finalizeJob() is a synchronized method. So the thread pool size and 
the queue size must be sufficiently large.

{noformat}
 executor = new ThreadPoolExecutor(1, 10, 1, TimeUnit.HOURS, 
     new LinkedBlockingQueue<Runnable>(100), 
ThreadPoolExecutor.CallerRunsPolicy);
{noformat}


* The following will eventually start up 5 threads and keep them running. 
Non-blocking and least amount of changes.

{noformat}
 executor = new ThreadPoolExecutor(5, 5, 1, TimeUnit.HOURS, new 
LinkedBlockingQueue<Runnable>());
{noformat}

What do you think is better? Or can you think of any better approaches?
                
> JobHistoryFilesManager thread pool never expands
> ------------------------------------------------
>
>                 Key: MAPREDUCE-4662
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 1.0.2
>            Reporter: Thomas Graves
>
> The job history file manager creates a threadpool with core size 1 thread, 
> max pool size 3.   It never goes beyond 1 thread though because its using a 
> LinkedBlockingQueue which doesn't have a max size. 
>     void start() {
>       executor = new ThreadPoolExecutor(1, 3, 1,
>           TimeUnit.HOURS, new LinkedBlockingQueue<Runnable>());
>     }
> According to the ThreadPoolExecutor java doc page it only increases the 
> number of threads when the queue is full. Since the queue we are using has no 
> max size it never fills up and we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to