[
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007441#comment-14007441
]
jay vyas commented on MAPREDUCE-5902:
-------------------------------------
FYI, a concrete example: These paths, whose job names seem to have been
truncated at some point i.e. {{ItemRatingVectorsMappe}} is clearly missing an
"R" .......... are not getting picked up by the JobHistoryServer .
{noformat}
└── tom
├──
job_1400794299637_0010-1400808860349-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400808889684-1-1-SUCCEEDED-default.jhist
├── job_1400794299637_0010_conf.xml
├── job_1400794299637_0010.summary
├──
job_1400794299637_0011-1400808893300-tom-ParallelALSFactorizationJob%2DTransposeMapper%2DReduce-1400808924396-1-1-SUCCEEDED-default.jhist
├── job_1400794299637_0011_conf.xml
├── job_1400794299637_0011.summary
├──
job_1400794299637_0012-1400808926898-tom-ParallelALSFactorizationJob%2DAverageRatingMapper%2DRe-1400808951099-1-1-SUCCEEDED-default.jhist
├── job_1400794299637_0012_conf.xml
└── job_1400794299637_0012.summary
{noformat}
> JobHistoryServer (HistoryFileManager) needs more debug logs.
> ------------------------------------------------------------
>
> Key: MAPREDUCE-5902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobhistoryserver
> Reporter: jay vyas
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> With the JobHistory Server , it appears that its possible sometimes to skip
> over certain history files. I havent been able to determine why yet, but
> I've found that some long named .jhist files aren't getting collected into
> the done/ directory.
> After tracing some in the actual source, and turning on DEBUG level logging,
> it became clear that this snippet is an important workhorse
> (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles
> ultimately boil down to scanDirectory()).
> It would be extremely useful , then, to have a couple of gaurded logs at this
> level of the code, so that we can see, in the log folders, why files are
> being filtered out , i.e. it is due to filterint or visibility.
> {noformat}
> private static List<FileStatus> scanDirectory(Path path, FileContext fc,
> PathFilter pathFilter) throws IOException {
> path = fc.makeQualified(path);
> List<FileStatus> jhStatusList = new ArrayList<FileStatus>();
> RemoteIterator<FileStatus> fileStatusIter = fc.listStatus(path);
> while (fileStatusIter.hasNext()) {
> FileStatus fileStatus = fileStatusIter.next();
> Path filePath = fileStatus.getPath();
> if (fileStatus.isFile() && pathFilter.accept(filePath)) {
> jhStatusList.add(fileStatus);
> }
> }
> return jhStatusList;
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)