[jira] [Commented] (MAPREDUCE-3112) Calling hadoop cli inside mapreduce job leads to errors

Eric Yang (Commented) (JIRA) Wed, 28 Sep 2011 08:49:11 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116551#comment-13116551
 ]


Eric Yang commented on MAPREDUCE-3112:
--------------------------------------

In previous release of HADOOP, we don't have this problem because we are always 
reconstructing HADOOP_OPTS from scratch in the invoking process.  
hadoop.log.dir is setup by the parent process to ensure the output are 
redirected properly to the desired location.  This change was done as part of 
request from HCatalog to have ability to override the HADOOP_OPTS.  HCatalog's 
request may be supported by changing HADOOP_OPTS overrides to HADOOP_USER_OPTS, 
and make HADOOP_USER_OPTS as the prefix of HADOOP_OPTS.

In streaming job, we should unset HADOOP_ROOT_LOGGER environment variable to 
ensure hadoop command invoked in streaming job is output to console which gets 
redirected to TaskLogAppender by the task attempt.
                
> Calling hadoop cli inside mapreduce job leads to errors
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-3112
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3112
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.20.205.0
>         Environment: Java, Linux
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.20.205.0
>
>
> When running a streaming job with mapper
> bin/hadoop --config /etc/hadoop/ jar 
> contrib/streaming/hadoop-streaming-0.20.205.0.jar -mapper "hadoop --config 
> /etc/hadoop/ dfs -help" -reducer NONE -input "/tmp/input.txt" -output NONE
> Task log shows:
> {noformat}
> Exception in thread "main" java.lang.ExceptionInInitializerError
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>       at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
> Caused by: org.apache.commons.logging.LogConfigurationException: 
> User-specified log class 'org.apache.commons.logging.impl.Log4JLogger' cannot 
> be found or is not useable.
>       at 
> org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:874)
>       at 
> org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
>       at 
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
>       at 
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
>       at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
>       at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:142)
>       ... 3 more
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed 
> with code 1
>       at 
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
>       at 
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
>       at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>       at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>       at org.apache.hadoop.mapred.Child.main(Child.java:255)
> {noformat}
> Upon inspection, there are two problems in the inherited from environment 
> which prevent the logger initialization to work properly.  In hadoop-env.sh, 
> the HADOOP_OPTS is inherited from the parent process.  This configuration was 
> requested by user to have a way to override HADOOP environment in the 
> configuration template:
> {noformat}
> export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_OPTS"
> {noformat}
> -Dhadoop.log.dir=$HADOOP_LOG_DIR/task_tracker_user is injected into 
> HADOOP_OPTS in the tasktracker environment.  Hence, the running task would 
> inherit the wrong logging directory, which the end user might not have 
> sufficient access to write.  Second, $HADOOP_ROOT_LOGGER is override to: 
> -Dhadoop.root.logger=INFO,TLA by the task controller, therefore, the 
> bin/hadoop script will attempt to use hadoop.root.logger=INFO,TLA, but fail 
> to initialize.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3112) Calling hadoop cli inside mapreduce job leads to errors

Reply via email to