And aside from refusing to declare task complete after everything is 100%, I
also notied that the mapper seems too slow. It's taking the same amount of
time for 4 machines to read and write through the 30gb file as if I did it
with a /bin/cat on one machine. Do you guys have any suggestions with
regards to these two problems?

On Wed, Dec 10, 2008 at 4:37 PM, hc busy <[email protected]> wrote:

> Guys, I've just configured a hadoop cluster for the first time, and I'm
> running a null map-reduction over the streaming interface. (/bin/cat for
> both map and reducer). So I noticed that the mapper and reducer complete
> 100% in the web ui within a reasonable amount of time, but the job does not
> complete. On command line it displays
>
> ...INFO streaming.StreamJob: map 100% reduce 100%
>
> In the web ui, it shows map completion graph is 100%, but does not display
> a reduce completion graph. The four machines are well equiped to handle the
> size of data (30gb). Looking at the task tracker on each of the machines, I
> noticed that it is ticking down the percents very very slowly:
>
> 2008-12-10 16:18:55,265 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000002_0 46.684883% Records R/W=149326846/149326834
> > reduce
> 2008-12-10 16:18:57,055 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000006_0 47.566963% Records R/W=151739348/151739342
> > reduce
> 2008-12-10 16:18:58,268 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000002_0 46.826576% Records R/W=149326846/149326834
> > reduce
> 2008-12-10 16:19:00,058 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000006_0 47.741756% Records R/W=153377016/153376990
> > reduce
> 2008-12-10 16:19:01,271 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000002_0 46.9636% Records R/W=149326846/149326834 >
> reduce
> 2008-12-10 16:19:03,061 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000006_0 47.94259% Records R/W=153377016/153376990
> > reduce
> 2008-12-10 16:19:04,274 INFO org.apache.hadoop.mapred.TaskTracker:
> task_200812101532_0001_r_000002_0 47.110992% Records R/W=150960648/150960644
> > reduce
>
> so it would continue like this for hours and hours. What buffer am I
> setting too small, or what could possiblly make it go so slow?? I've worked
> on hadoop clusters before and it had always performed great on similar sized
> or larger data sets, so I suspect it's just a configuration some where that
> is making it do this?
>
> thanks in advance.
>
>
>

Reply via email to