Hadoop folks might be interested that we've used Hadoop to render some
maps of the Internet address space.

Aggregated maps are at <http://www.isi.edu/ant/address>;
we've rendered these both with and without Hadoop.

The more intresting map that required Hadoop is at 
<http://www.isi.edu/ant/address/whole_internet>.
This map is to scale, so pixels and IP addresses are one-to-one,
and the printed result (at 600dpi) is more than 9' tall.

Rendering this took 19 hours on our 52-core Hadoop cluster.
(Printing then took another 36 hours on our single printer :-( )
We were using Hadoop streaming.


Our use of Hadoop was not trouble free: we had trouble with both reduce
jobs hanging and mappers running out of memory (details below).  If
others have seen these kind of problems, please let us know.

Our full reduce job completed only 502 of the 503 reduces, so there are
a few holes in the picture that shouldn't be there.  When I checked on
the status I saw two instances of the reducer running, both reporting
hung at ~87% completion.  But looking at the logs, I see things like
this:


                2007-09-29 16:59:34,628 INFO 
org.apache.hadoop.streaming.PipeMapRed: R/W/S=36181401/0/0 
in:3173=36181401/11400 [rec/s] out:0=0/11400 [rec/s]
                2007-09-29 16:59:34,629 INFO 
org.apache.hadoop.streaming.PipeMapRed: R/W/S=36181501/0/0 
in:3173=36181501/11400 [rec/s] out:0=0/11400 [rec/s]
                2007-09-29 16:59:34,629 INFO 
org.apache.hadoop.streaming.PipeMapRed: R/W/S=36181601/0/0 
in:3173=36181601/11400 [rec/s] out:0=0/11400 [rec/s]
                2007-09-29 16:59:36,768 INFO 
org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
                2007-09-29 16:59:36,784 INFO 
org.apache.hadoop.streaming.PipeMapRed: MROutputThread done
                2007-09-29 16:59:36,858 INFO 
org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
                2007-09-29 16:59:37,193 WARN 
org.apache.hadoop.mapred.TaskTracker: Error running child
                java.io.IOException: subprocess exited successfully
                R/W/S=36181627/0/0 in:3172=36181627/11403 [rec/s] out:0=0/11403 
[rec/s]
                minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
                HOST=null
                USER=hadoop
                HADOOP_USER=null
                last Hadoop input: |null|
                last tool output: |null|
                Date: Sat Sep 29 16:59:36 PDT 2007
                Broken pipe
                        at 
org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:105)
                        at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:324)
                        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1800)

I'm confused by the confliting messages: the in:/out: looks like it at
least read lots of stuff.
The "streaming.PipeMapRed: mapRedFinished" should b e a positive mesage,
right?  Then I get "Error running child" (bad) and "subprocess exited
successfully" (good) and "Broken pipe" (bad).  Which is it?
And if it gets stuck, why doesn't hadoop just time these out and restart?

Originally I had two reduces hung, but manually killing one caused
hadoop to restart it and then it completed.

When I re-ran the whole job, I ended up with more stuck jobs (~10 of


Our mapper problem is that our custom inputreader was causing map to run
out of memory.  We don't think we leak memory, but we're trying to debug
it.  We'll post details later.

   -John

Reply via email to