[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007959#comment-14007959
 ] 

jay vyas commented on MAPREDUCE-5902:
-------------------------------------

After Further investigation, it appears that files with {{ % escape characters 
}} in them arent picked up by the JobHistoryServer.  I'd like the opinion of 
one of the JobHistoryServer authors to confirm/deny wether jobnames are indeed 
allowed to include {{"%"}} signs in them, i.e. {{name%-myName}}.  

Has anyone else seen this before?  I'd be somewhat surprised if I was the only 
person who has run into it .... I can't imagine its a configuration error of 
any sort?

The below files appear to be "stuck" in mr-history "purgatory", neither are 
they detectable as completed jobs from a REST request {{ curl 
http://10.1.4.138:19888/ws/v1/history/mapreduce/jobs | python -mjson.tool }} to 
the JobHistoryServer API, **nor** are they ever moved to {{/mr-history/done/}}

{noformat}
/mr-history/tmp/tom/job_1400794299637_0010-1400808860349-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400808889684-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400794299637_0011-1400808893300-tom-ParallelALSFactorizationJob%2DTransposeMapper%2DReduce-1400808924396-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400794299637_0012-1400808926898-tom-ParallelALSFactorizationJob%2DAverageRatingMapper%2DRe-1400808951099-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400794299637_0017-1400814057680-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400814090466-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400873461827_0016-1400874599994-tom-select+count%28*%29+from+bps_cleaned%28Stage%2D1%29-1400874621636-1-1-SUCCEEDED-default.jhist
/mr-history/tmp/tom/job_1400873461827_0023-1400894507822-tom-name%252dname-1400894528285-1-1-SUCCEEDED-default.jhist
{noformat}

> JobHistoryServer (HistoryFileManager) needs more debug logs.
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-5902
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>            Reporter: jay vyas
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> With the JobHistory Server , it appears that its possible sometimes to skip 
> over certain history files.  I havent been able to determine why yet, but 
> I've found that some long named .jhist files aren't getting collected into 
> the done/ directory.
> After tracing some in the actual source, and turning on DEBUG level logging, 
> it became clear that this snippet is an important workhorse 
> (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles 
> ultimately boil down to scanDirectory()).  
> It would be extremely useful , then, to have a couple of gaurded logs at this 
> level of the code, so that we can see, in the log folders, why files are 
> being filtered out  , i.e. it is due to filterint or visibility.
> {noformat}
>   private static List<FileStatus> scanDirectory(Path path, FileContext fc,
>       PathFilter pathFilter) throws IOException {
>     path = fc.makeQualified(path);
>     List<FileStatus> jhStatusList = new ArrayList<FileStatus>();
>     RemoteIterator<FileStatus> fileStatusIter = fc.listStatus(path);
>     while (fileStatusIter.hasNext()) {
>       FileStatus fileStatus = fileStatusIter.next();
>       Path filePath = fileStatus.getPath();
>       if (fileStatus.isFile() && pathFilter.accept(filePath)) {
>         jhStatusList.add(fileStatus);
>       }
>     }
>     return jhStatusList;
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to