Re: What is the best way to use the Hadoop output data

Zhong Wang Fri, 26 Jun 2009 01:10:39 -0700

Hi Huy,

On Thu, Jun 25, 2009 at 6:02 PM, Huy Phan<dac...@gmail.com> wrote:
> I'm wondering if there's any performance killer in this approach, I posted
> the question to IRC channel and someone told me that there may be a
> bottleneck.


There may be some communication errors to block your MapReduce job
when you post your output data. So I think it's better to do this
after the job is done.

> I wonder if there is any way to spawn a process directly from Hadoop after
> all the MapReduce tasks finish ?
>

How do you submit your jobs? You can block the job submit process by
running job using job.waitForCompletion(true) in your main driver
class. Then the two processes are synchronous.


-- 
Zhong Wang

Re: What is the best way to use the Hadoop output data

Reply via email to