Re: Hadoop Design Question

Owen O'Malley Thu, 06 Nov 2008 10:27:22 -0800

On Nov 6, 2008, at 11:29 AM, Ricky Ho wrote:

Disk I/O overhead
==================
- The output of a Map task is written to a local disk and then lateron upload to the Reduce task. While this enable a simple recoverystrategy when the map task failed, it incur additional disk I/Ooverhead.

That is correct. However, Linux does very well at using extra ram forbuffer caches, so as long as you enable write behind it won't be aperformance problem. You are right that the primary motivation is bothrecoverability and not needing the reduces running until after mapsfinish.

So I am wondering if there is an option to bypassing the step ofwriting the map result to the local disk.


Currently no.

- In the current setting, it sounds like no reduce task will bestarted before all map tasks have completed. In case if there are afew slow running map tasks, the whole job will be delayed.

The application's reduce function can't start until the last mapfinishes because the input to the reduce is sorted. Since the last mapmay generate the first keys that must be given to the reduce, thereduce must wait.

- The overall job execution can be shortened if the reduce tasks canstarts its processing as soon as some map results are availablerather than waiting for all the map tasks to complete.

But it would violate the specification of the framework that the inputto reduce is completely sorted.

- Therefore it is impossible for the reduce phase of Job1 to streamits output data to a file while the map phase of Job2 start readingthe same file. Job2 can only start after ALL REDUCE TASKS of Job1is completed, which makes pipelining between jobs impossible.

It is currently not supported, but the framework could be extended tolet the client add input splits after the job has started. That wouldremove the hard synchronization between jobs.

- This means the partitioning function has to be chosen carefully tomake sure the workload of the reduce processes is balanced. (maybenot a big deal)


Yes, the partitioner must balance the workload between the reduces.

- Is there any thoughts of running a pool of reduce tasks on thesame key and have they combine their results later ?

That is called the combiner. It is called multiple times as the datais merged together. See the word count example. If the reducer doesdata reduction, using combiners is very important for performance.


-- Owen

Re: Hadoop Design Question

Reply via email to