[
https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572891#comment-14572891
]
Hudson commented on MAPREDUCE-5965:
-----------------------------------
SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/])
MAPREDUCE-5965. Hadoop streaming throws error if list of input files is high.
Error is: "error=7, Argument list too long at if number of input file is high"
(wilfreds via rkanter) (rkanter: rev cc70df98e74142331043a611a3bd8a53ff6a2242)
*
hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/StreamJob.java
* hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm
* hadoop-mapreduce-project/CHANGES.txt
*
hadoop-tools/hadoop-streaming/src/main/java/org/apache/hadoop/streaming/PipeMapRed.java
> Hadoop streaming throws error if list of input files is high. Error is:
> "error=7, Argument list too long at if number of input file is high"
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5965
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Arup Malakar
> Assignee: Wilfred Spiegelenburg
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch,
> MAPREDUCE-5965.3.patch, MAPREDUCE-5965.patch
>
>
> Hadoop streaming exposes all the key values in job conf as environment
> variables when it forks a process for streaming code to run. Unfortunately
> the variable mapreduce_input_fileinputformat_inputdir contains the list of
> input files, and Linux has a limit on size of environment variables +
> arguments.
> Based on how long the list of files and their full path is this could be
> pretty huge. And given all of these variables are not even used it stops user
> from running hadoop job with large number of files, even though it could be
> run.
> Linux throws E2BIG if the size is greater than certain size which is error
> code 7. And java translates that to "error=7, Argument list too long". More:
> http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping
> variables if it is greater than certain length. That way if user code
> requires the environment variable it would fail. It should also introduce a
> config variable to skip long variables, and set it to false by default. That
> way user has to specifically set it to true to invoke this feature.
> Here is the exception:
> {code}
> Error: java.lang.RuntimeException: Error in configuring object at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:415) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by:
> java.lang.reflect.InvocationTargetException at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606) at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
> ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object
> at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14
> more Caused by: java.lang.reflect.InvocationTargetException at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606) at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
> ... 17 more Caused by: java.lang.RuntimeException: configuration exception at
> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at
> org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22
> more Caused by: java.io.IOException: Cannot run program
> "/data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_000006/./rbenv_runner.sh":
> error=7, Argument list too long at
> java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at
> org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23
> more Caused by: java.io.IOException: error=7, Argument list too long at
> java.lang.UNIXProcess.forkAndExec(Native Method) at
> java.lang.UNIXProcess.<init>(UNIXProcess.java:135) at
> java.lang.ProcessImpl.start(ProcessImpl.java:130) at
> java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 24 more
> Container killed by the ApplicationMaster. Container killed on request. Exit
> code is 143 Container exited with a non-zero exit code 143
> {code}
> Hive does a similar trick: HIVE-2372 I have a patch for this, will soon
> submit a patch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)