Hadoop Design Question

Ricky Ho Thu, 06 Nov 2008 09:29:41 -0800

Hi,

While exploring how Hadoop fits in our usage scenarios, there are 4 recurring 
issues keep popping up.  I don't know if they are real issues or just our 
misunderstanding of Hadoop.  Can any expert shed some light here ?


Disk I/O overhead
==================
- The output of a Map task is written to a local disk and then later on upload 
to the Reduce task.  While this enable a simple recovery strategy when the map 
task failed, it incur additional disk I/O overhead.

- For example, in our popular Hadoop example of calculating the approximation 
of "Pi", there isn't any input data.  The map tasks in this example, should 
just directly feed its output to the reduce task.  So I am wondering if there 
is an option to bypassing the step of writing the map result to the local disk.


Pipelining between Map & Reduce phases is not possible
=======================================================
- In the current setting, it sounds like no reduce task will be started before 
all map tasks have completed.  In case if there are a few slow running map 
tasks, the whole job will be delayed.

- The overall job execution can be shortened if the reduce tasks can starts its 
processing as soon as some map results are available rather than waiting for 
all the map tasks to complete.


Pipelining between jobs
========================
- In many cases, we've found the parallel computation doesn't involve just one 
single map/reduce job, but multiple inter-dependent map/reduce jobs then work 
together in some coordinating fashion.

- Again, I haven't seen any mechanism available for 2 MapReduce jobs to 
directly interact with each other.  Job1 must write its output to HDFS for Job2 
to pickup. On the other hand, once the "map" phase of a Job2 has started, all 
its input HDFS files has to be freezed (in other words, Job1 cannot append more 
records into the HDFS files)

- Therefore it is impossible for the reduce phase of Job1 to stream its output 
data to a file while the map phase of Job2 start reading the same file.  Job2 
can only start after ALL REDUCE TASKS of Job1 is completed, which makes 
pipelining between jobs impossible.


No parallelism of reduce task with one key
===========================================
- Parallelism only happens in the map phase, as well as reduce phase (on 
different keys).  But there is no parallelism within a reduce process of a 
particular key

- This means the partitioning function has to be chosen carefully to make sure 
the workload of the reduce processes is balanced.  (maybe not a big deal)

- Is there any thoughts of running a pool of reduce tasks on the same key and 
have they combine their results later ?


Rgds, Ricky

Hadoop Design Question

Reply via email to