[
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007959#comment-14007959
]
jay vyas commented on MAPREDUCE-5902:
-------------------------------------
After Further investigation, it appears that files with {{ % escape characters
}} in them arent picked up by the JobHistoryServer. I'd like the opinion of
one of the JobHistoryServer authors to confirm/deny wether jobnames are indeed
allowed to include {{"%"}} signs in them, i.e. {{name%-myName}}.
Has anyone else seen this before? I'd be somewhat surprised if I was the only
person who has run into it .... I can't imagine its a configuration error of
any sort?
The below files appear to be "stuck" in mr-history "purgatory", neither are
they detectable as completed jobs from a REST request {{ curl
http://10.1.4.138:19888/ws/v1/history/mapreduce/jobs | python -mjson.tool }} to
the JobHistoryServer API, **nor** are they ever moved to {{/mr-history/done/}}
{noformat}
/mr-history/tmp/tom/job_1400794299637_0010-1400808860349-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400808889684-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400794299637_0011-1400808893300-tom-ParallelALSFactorizationJob%2DTransposeMapper%2DReduce-1400808924396-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400794299637_0012-1400808926898-tom-ParallelALSFactorizationJob%2DAverageRatingMapper%2DRe-1400808951099-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400794299637_0017-1400814057680-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400814090466-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400873461827_0016-1400874599994-tom-select+count%28*%29+from+bps_cleaned%28Stage%2D1%29-1400874621636-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400873461827_0023-1400894507822-tom-name%252dname-1400894528285-1-1-SUCCEEDED-default.jhist
{noformat}
> JobHistoryServer (HistoryFileManager) needs more debug logs.
> ------------------------------------------------------------
>
> Key: MAPREDUCE-5902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobhistoryserver
> Reporter: jay vyas
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> With the JobHistory Server , it appears that its possible sometimes to skip
> over certain history files. I havent been able to determine why yet, but
> I've found that some long named .jhist files aren't getting collected into
> the done/ directory.
> After tracing some in the actual source, and turning on DEBUG level logging,
> it became clear that this snippet is an important workhorse
> (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles
> ultimately boil down to scanDirectory()).
> It would be extremely useful , then, to have a couple of gaurded logs at this
> level of the code, so that we can see, in the log folders, why files are
> being filtered out , i.e. it is due to filterint or visibility.
> {noformat}
> private static List<FileStatus> scanDirectory(Path path, FileContext fc,
> PathFilter pathFilter) throws IOException {
> path = fc.makeQualified(path);
> List<FileStatus> jhStatusList = new ArrayList<FileStatus>();
> RemoteIterator<FileStatus> fileStatusIter = fc.listStatus(path);
> while (fileStatusIter.hasNext()) {
> FileStatus fileStatus = fileStatusIter.next();
> Path filePath = fileStatus.getPath();
> if (fileStatus.isFile() && pathFilter.accept(filePath)) {
> jhStatusList.add(fileStatus);
> }
> }
> return jhStatusList;
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)