Howdy,
I'm new to Hadoop. I've got a network of 8 machines with ~1.8TB of
storage. My first Hadoop test run is to count the URLs in a set of
crawled pages (~1.6M pages consuming about 70GB of space.) When I
run my app (or just run the Grep example) on the data set, the map
task gets to 100%, then I get an IOException and when I review the
logs, there's an OutOfMemory error listed in the tasktracker logs
("INFO org.apache.hadoop.madred.TaskRunner: task_0001_m_000258_0
java.lang.OutOfMemoryError: Java heap space.")
I've tried upping mapred.child.java.opts, but that doesn't seem to
make a difference.
Any suggestions on what I can do?
Thanks,
David