[
https://issues.apache.org/jira/browse/MAPREDUCE-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated MAPREDUCE-5850:
-------------------------------------
Attachment: MAPREDUCE-5850.1.patch
This is only a problem on Windows. It doesn't happen on Linux. Here is a
description of how this happens.
In {{MRJobConfig}}, the default value of {{mapreduce.admin.user.env}} is
defined to set the PATH environment variable on Windows so that tasks will be
able to find and load hadoop.dll.
{code}
public final String DEFAULT_MAPRED_ADMIN_USER_ENV =
Shell.WINDOWS ?
"PATH=%PATH%;%HADOOP_COMMON_HOME%\\bin":
"LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native";
{code}
{{TaskAttemptImpl#createCommonContainerLaunchContext}} sets up the base
environment. As part of that, it includes picking up
{{mapreduce.admin.user.env}}. This is the point where the behavior diverges
from Linux. On Linux, the common context won't have a PATH. On Windows, the
common context will have a PATH.
{code}
// Add the env variables passed by the admin
MRApps.setEnvFromInputString(
environment,
conf.get(
MRJobConfig.MAPRED_ADMIN_USER_ENV,
MRJobConfig.DEFAULT_MAPRED_ADMIN_USER_ENV), conf
);
{code}
Then, at task launch time, we end up setting PATH again via a call to
{{TaskAttemptImpl#createContainerLaunchContext}} ->
{{MapReduceChildJVM#setVMEnv}} -> {{MRApps#setEnvFromInputString}} ->
{{Apps#setEnvFromInputString}}. This uses {{Apps#addToEnvironment}} to set the
new value in the environment, and the logic of this method appends to existing
values:
{code}
@Public
@Unstable
public static void addToEnvironment(
Map<String, String> environment,
String variable, String value, String classPathSeparator) {
String val = environment.get(variable);
if (val == null) {
val = value;
} else {
val = val + classPathSeparator + value;
}
environment.put(StringInterner.weakIntern(variable),
StringInterner.weakIntern(val));
}
{code}
I haven't been able to come up with a clean fix for this. We can't change the
default value of {{mapreduce.admin.user.env}}, because tasks are dependent on
it to find the native code (an absolute must on Windows). We can't drop the
appending behavior, because there are valid use cases dependent on it. Adding
a special case for Windows + PATH seems hacky. Does anyone else have ideas?
Since this is ultimately harmless, we might consider simply relaxing the
assertion in {{TestMiniMRChildTask}}. I'm attaching a patch that does that.
This passes on Mac and Windows.
> PATH environment variable contains duplicate values in map and reduce tasks
> on Windows.
> ---------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5850
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5850
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: client
> Affects Versions: 3.0.0, 2.4.0
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Priority: Minor
> Attachments: MAPREDUCE-5850.1.patch
>
>
> The value of the PATH environment variable gets appended twice before
> execution of a container for a map or reduce task. This is ultimately
> harmless at runtime, but it does cause a failure in {{TestMiniMRChildTask}}
> when running on Windows.
--
This message was sent by Atlassian JIRA
(v6.2#6252)