[
https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wilfred Spiegelenburg updated MAPREDUCE-5965:
---------------------------------------------
Attachment: MAPREDUCE-5965.3.patch
Updated the patch using the new name and made it an integer as [~djp] proposed.
The documentation and the usage that is printed in the StreamJob have been
updated to show the new option and the values.
To answer the 20000 question: it would be long enough to leave all but the
problem value alone.
Three values are documented:
-1: do not truncate (default)
0: only copy the key and not the value (side effect of using substring)
20000: as a safe value which should prevent the "error=7" issue
> Hadoop streaming throws error if list of input files is high. Error is:
> "error=7, Argument list too long at if number of input file is high"
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5965
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Arup Malakar
> Assignee: Wilfred Spiegelenburg
> Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch,
> MAPREDUCE-5965.3.patch, MAPREDUCE-5965.patch
>
>
> Hadoop streaming exposes all the key values in job conf as environment
> variables when it forks a process for streaming code to run. Unfortunately
> the variable mapreduce_input_fileinputformat_inputdir contains the list of
> input files, and Linux has a limit on size of environment variables +
> arguments.
> Based on how long the list of files and their full path is this could be
> pretty huge. And given all of these variables are not even used it stops user
> from running hadoop job with large number of files, even though it could be
> run.
> Linux throws E2BIG if the size is greater than certain size which is error
> code 7. And java translates that to "error=7, Argument list too long". More:
> http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping
> variables if it is greater than certain length. That way if user code
> requires the environment variable it would fail. It should also introduce a
> config variable to skip long variables, and set it to false by default. That
> way user has to specifically set it to true to invoke this feature.
> Here is the exception:
> {code}
> Error: java.lang.RuntimeException: Error in configuring object at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by:
> java.lang.reflect.InvocationTargetException at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606) at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
> ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14
> more Caused by: java.lang.reflect.InvocationTargetException at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606) at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
> ... 17 more Caused by: java.lang.RuntimeException: configuration exception at
> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at
> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22
> more Caused by: java.io.IOException: Cannot run program
> "/data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_000006/./rbenv_runner.sh":
> error=7, Argument list too long at
> java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at
> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23
> more Caused by: java.io.IOException: error=7, Argument list too long at
> java.lang.UNIXProcess.forkAndExec(Native Method) at
> java.lang.UNIXProcess.<init>(UNIXProcess.java:135) at
> java.lang.ProcessImpl.start(ProcessImpl.java:130) at
> java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 24 more
> Container killed by the ApplicationMaster. Container killed on request. Exit
> code is 143 Container exited with a non-zero exit code 143
> {code}
> Hive does a similar trick: HIVE-2372 I have a patch for this, will soon
> submit a patch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)