[
https://issues.apache.org/jira/browse/HADOOP-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263802#comment-14263802
]
Akshay Rai commented on HADOOP-7611:
------------------------------------
[~ozawa], sorry for the late reply. I am not clear with the point you raised.
From what I can gather, the output directory is different from the directory
where the temporary/intermediate files are stored. The temporary files are
stored in a location specified by "mapred.local.dir" or "io.seqfile.local.dir"
and the output is stored in the location specified by the user(outFile,
parameter to the sort method).
However, the problem here is that the code creates the temporary files in hdfs
if it has the required permissions and the reason as explained in the
description.
> SequenceFile.Sorter creates local temp files on HDFS
> ----------------------------------------------------
>
> Key: HADOOP-7611
> URL: https://issues.apache.org/jira/browse/HADOOP-7611
> Project: Hadoop Common
> Issue Type: Bug
> Components: io
> Affects Versions: 0.20.2
> Environment: CentOS 5.6 64-bit, Oracle JDK 1.6.0_26 64-bit
> Reporter: Bryan Keller
>
> When using SequenceFile.Sorter to sort or merge sequence files that exist in
> HDFS, it attempts to create temp files in a directory structure specified by
> mapred.local.dir but on HDFS, not in the local file system. The problem code
> is in MergeQueue.merge(). Starting at line 2953:
> {code}
> Path outputFile = lDirAlloc.getLocalPathForWrite(
> tmpFilename.toString(),
> approxOutputSize, conf);
> LOG.debug("writing intermediate results to " + outputFile);
> Writer writer = cloneFileAttributes(
>
> fs.makeQualified(segmentsToMerge.get(0).segmentPathName),
> fs.makeQualified(outputFile),
> null);
> {code}
> The outputFile here is a local path without a scheme, e.g.
> "/mnt/mnt1/mapred/local", specified by the mapred.local.dir property. If we
> are sorting files on HDFS, the fs object is a DistributedFileSystem. The call
> to fs.makeQualified(outputFile) appends the fs object's scheme to the local
> temp path returned by lDirAlloc, e.g. hdfs://mnt/mnt1/mapred/local. This
> directory is then created (if the proper permissions are available) on HDFS.
> If the HDFS permissions are not available, the sort/merge fails even though
> the directories exist locally.
> The code should instead always use the local file system if retrieving a path
> from the mapred.local.dir property. The unit tests do not test this
> condition, they only test using the local file system for sort and merge.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)