Calling hadoop cli inside mapreduce job leads to errors
-------------------------------------------------------

                 Key: MAPREDUCE-3112
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3112
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: contrib/streaming
    Affects Versions: 0.20.205.0
         Environment: Java, Linux
            Reporter: Eric Yang
            Assignee: Eric Yang
             Fix For: 0.20.205.0


When running a streaming job with mapper

bin/hadoop --config /etc/hadoop/ jar 
contrib/streaming/hadoop-streaming-0.20.205.0.jar -mapper "hadoop --config 
/etc/hadoop/ dfs -help" -reducer NONE -input "/tmp/input.txt" -output NONE

Task log shows:

{noformat}
Exception in thread "main" java.lang.ExceptionInInitializerError
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
Caused by: org.apache.commons.logging.LogConfigurationException: User-specified 
log class 'org.apache.commons.logging.impl.Log4JLogger' cannot be found or is 
not useable.
        at 
org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:874)
        at 
org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
        at 
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
        at 
org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
        at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
        at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:142)
        ... 3 more
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
with code 1
        at 
org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
        at 
org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:255)
{noformat}

Upon inspection, there are two problems in the inherited from environment which 
prevent the logger initialization to work properly.  In hadoop-env.sh, the 
HADOOP_OPTS is inherited from the parent process.  This configuration was 
requested by user to have a way to override HADOOP environment in the 
configuration template:

{noformat}
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_OPTS"
{noformat}

-Dhadoop.log.dir=$HADOOP_LOG_DIR/task_tracker_user is injected into HADOOP_OPTS 
in the tasktracker environment.  Hence, the running task would inherit the 
wrong logging directory, which the end user might not have sufficient access to 
write.  Second, $HADOOP_ROOT_LOGGER is override to: 
-Dhadoop.root.logger=INFO,TLA by the task controller, therefore, the bin/hadoop 
script will attempt to use hadoop.root.logger=INFO,TLA, but fail to initialize. 
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to