I'm try to write a daemon that periodically wakes up and runs map/reduce
jobs, but I've have little luck. I've tried different ways (including using
cascading) and I keep arriving at the below exception:
java.lang.OutOfMemoryError: Java heap space
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:359)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:185)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
at
com.txing.mapred.watcher.DirWatcherImpl2.runMapReduce(DirWatcherImpl2.java:29)
at com.txing.mapred.watcher.DirWatcher.run(DirWatcher.java:52)
at java.lang.Thread.run(Thread.java:637)
I've set this propertyl: mapred.child.java.opts larger and larger makes no
difference.
Furthermore, I get working like this:
WARN | No job jar file set. User classes may not be found. See
JobConf(Class) or JobConf#setJar(String). | JobClient.java:637 |
org.apache.hadoop.mapred.JobClient | Thread-0 |
WARN | job_local_1 | LocalJobRunner.java:234 |
org.apache.hadoop.mapred.LocalJobRunner | Thread-15 |
Do I have to submit jar files to hadoop? Can't I daemonize this?
Thanks,
Shahab