Hi Hadoop developers,
I'm confused about the way logging works within map or reduce tasks.
Since tasks are launched in a new JVM the java system properties
"hadoop.log.dir" and "hadoop.log.file" are not passed to the new JVM.
This prevents the child process from logging properly. In fact you get:
java.io.FileNotFoundException: / (Is a directory)
at java.io.FileOutputStream.openAppend(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:177)
at java.io.FileOutputStream.<init>(FileOutputStream.java:102)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:289)
at org.apache.log4j.RollingFileAppender.setFile
(RollingFileAppender.java:165)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:
163)
at org.apache.log4j.config.PropertySetter.activate
(PropertySetter.java:256)
at org.apache.log4j.config.PropertySetter.setProperties
(PropertySetter.java:132)
at org.apache.log4j.config.PropertySetter.setProperties
(PropertySetter.java:96)
at org.apache.log4j.PropertyConfigurator.parseAppender
(PropertyConfigurator.java:654)
at org.apache.log4j.PropertyConfigurator.parseCategory
(PropertyConfigurator.java:612)
at org.apache.log4j.PropertyConfigurator.configureRootCategory
(PropertyConfigurator.j
2006-07-25 15:59:07,553 INFO mapred.TaskTracker
(TaskTracker.java:main(993)) - Child
at org.apache.log4j.PropertyConfigurator.doConfigure
(PropertyConfigurator.java:415)
at org.apache.log4j.PropertyConfigurator.doConfigure
(PropertyConfigurator.java:441)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure
(OptionConverter.java:4
at org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at org.apache.commons.logging.impl.Log4JLogger.getLogger
(Log4JLogger.java:229)
at org.apache.commons.logging.impl.Log4JLogger.<init>
(Log4JLogger.java:65)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance
(NativeConstructorAccessorImp
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAcc
at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
at org.apache.commons.logging.impl.LogFactoryImpl.newInstance
(LogFactoryImpl.java:529
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance
(LogFactoryImpl.java:235
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:370)
at org.apache.hadoop.mapred.TaskTracker.<clinit>(TaskTracker.java:44)
at org.apache.hadoop.mapred.TaskTracker$Child.main
(TaskTracker.java:993)
We see several ways to solve this problem. First retrieve the
properties "hadoop.log.dir" and "hadoop.log.file" from the mother JVM
and then pass them to the child JVM as within the args parameter.
Second would be to access the environment variables
"$HADOOP_LOG_DIR" and "$HADOOP_LOGFILE" using System.getEnv (java 1.5).
Third there would be a more general solution. Taskrunner would
resolve any environment variables it found in
"mapred.child.java.opts" by lookup the value using System.getEnv().
Eg:
unix:
export MAX_MEMORY = 200
hadoop-site.xml:
<name>mapred.child.java.opts</name>
<value>-Xmx${MAX_MEMORY}</value>
Stefan