[
https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029597#comment-14029597
]
Chris Nauroth commented on MAPREDUCE-5912:
------------------------------------------
+1 for this patch.
[~rusanu], [~curino] and [~chris.douglas], my understanding is that
MAPREDUCE-5196 accidentally introduced this bug, but this part of the change is
not strictly necessary for the goals of MAPREDUCE-5196. Based on that, I'm in
favor of committing this patch to revert just the part of MAPREDUCE-5196 that
caused the bug. The alternative patch on the {{Path}} class posted in
HADOOP-10663 has some other potential side effects, so I prefer doing a
localized fix here in MR. (I'll enter more details on HADOOP-10663.)
If in the future we want to revisit the idea of map outputs going somewhere
different than the local file system, then I think we'd need a different patch.
I think we'd want to make sure that the map output's {{Path}} instance
contains an explicit scheme, so that the code here doesn't need to assume local
vs. default vs. something else.
Can you let me know if you agree with committing this and not committing
HADOOP-10663? I'll hold off on committing until I hear from one of you.
> Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
> ---------------------------------------------------------------------------
>
> Key: MAPREDUCE-5912
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 3.0.0
> Reporter: Remus Rusanu
> Assignee: Remus Rusanu
> Fix For: 3.0.0
>
> Attachments: MAPREDUCE-5912.1.patch
>
>
> {code}
> @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
> if (isMapTask() && conf.getNumReduceTasks() > 0) {
> try {
> Path mapOutput = mapOutputFile.getOutputFile();
> - FileSystem localFS = FileSystem.getLocal(conf);
> - return localFS.getFileStatus(mapOutput).getLen();
> + FileSystem fs = mapOutput.getFileSystem(conf);
> + return fs.getFileStatus(mapOutput).getLen();
> } catch (IOException e) {
> LOG.warn ("Could not find output size " , e);
> }
> {code}
> causes Windows local output files to be routed through HDFS:
> {code}
> 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.lang.IllegalArgumentException: Pathname
> /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_000000_0/file.out
> from
> c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_000000_0/file.out
> is not a valid DFS filename.
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
> at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
> at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
> at org.apache.hadoop.mapred.Task.done(Task.java:1048)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)