Hi all,
I'm a bit confused by the way logging works on Hadoop.
In short, my question is: where does the log from my Nutch plugins end
up when running on Hadoop?
I'm running Nutch 0.9 on Hadoop 0.12.2.
When I run my code on a single machine I can see that the log ends up in
${hadoop.log.dir}/${hadoop.log.file}, as defined in the log4j.properties
file (Nutch and my plugins use commons-logging).
But when I use Hadoop, I can't find any logfile which contains log
entries generated by the (map|reduce) tasks. However, I do find logfiles
which contain log from Hadoop-related classes (like the tasktracker,
jobtracker etc).
I first thought it had something to do with HADOOP-406
(http://issues.apache.org/jira/browse/HADOOP-406) which is about the
fact that environment parameters passed to the parent JVM (like
'hadoop.log.file') are not passed to the child JVM's. But even when I
specify my log file explicitly (without the use of environment vars) in
the log4j.properties, I still see no log entries other than the Hadoop
classes.
Any clues?
Mathijs