[
https://issues.apache.org/jira/browse/MAPREDUCE-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910504#action_12910504
]
Amar Kamat commented on MAPREDUCE-2078:
---------------------------------------
There is a {{FileSystem.globStatus(Path)}} API in FileSystem to enumerate all
the paths represented by a globbed path.
The current {{TraceBuilder}} code does the following
{code}
for (int i = 2 + switchTop; i < args.length; ++i) {
Path thisPath = new Path(args[i]);
FileSystem fs = thisPath.getFileSystem(conf);
if (fs.getFileStatus(thisPath).isDirectory()) {
FileStatus[] statuses = fs.listStatus(thisPath);
for (FileStatus s : statuses) {
// process the file
..
}
}
{code}
This needs to changed to first flatten the globbed paths passed as input. So
the suggested fix is
{code}
for (int i = 2 + switchTop; i < args.length; ++i) { // iterate over the input
Path thisPath = new Path(args[i]);
// get the filesystem specific to the input passed
FileSystem fs = thisPath.getFileSystem(conf);
// flatten the globbed file path
FileStatus[] realStatuses = fs.globStatus(thisPath);
// iterate over all the files under the globbed input path
for (FileStatus status : realStatuses) {
// extract the actual (flat) path from the file status
Path realPath = status.getPath();
// now do what is done in the trunk
if (fs.getFileStatus(realPath).isDirectory()) {
FileStatus[] statuses = fs.listStatus(realPath);
for (FileStatus s : statuses) {
// process the file
..
}
}
}
}
{code}
I ran {{TraceBuilder}} with this fix and now it works with globbed input paths.
> TraceBuilder unable to generate the traces while giving the job history path
> by globing.
> ----------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-2078
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2078
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: tools/rumen
> Reporter: Vinay Kumar Thota
> Assignee: Amar Kamat
>
> I was trying to generate the traces for MR job histories by using
> TraceBuilder. However, it's unable to generate the traces while giving the
> job history path by globing. It throws a file not found exception even though
> the job history path is exists.
> I have provide the job history path in the below way.
> hdfs://<<clustername>>/dir1/dir2/dir3/*/*/*/*/*/*/
> Exception:
> java.io.FileNotFoundException: File does not exist:
> hdfs://<<clustername>>/dir1/dir2/dir3/*/*/*/*/*/*
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:525)
> at
> org.apache.hadoop.tools.rumen.TraceBuilder$MyOptions.<init>(TraceBuilder.java:88)
> at
> org.apache.hadoop.tools.rumen.TraceBuilder.run(TraceBuilder.java:183)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at
> org.apache.hadoop.tools.rumen.TraceBuilder.main(TraceBuilder.java:121)
> It's truncating the last slash in the path.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.