[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs.

jay vyas (JIRA) Fri, 23 May 2014 10:43:20 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007441#comment-14007441
 ]


jay vyas commented on MAPREDUCE-5902:
-------------------------------------

FYI, a concrete example:  These paths, whose job names seem to have been 
truncated at some point i.e. {{ItemRatingVectorsMappe}} is clearly missing an 
"R" .......... are not getting picked up by the JobHistoryServer .  

{noformat}
└── tom
    ├── 
job_1400794299637_0010-1400808860349-tom-ParallelALSFactorizationJob%2DItemRatingVectorsMappe-1400808889684-1-1-SUCCEEDED-default.jhist
    ├── job_1400794299637_0010_conf.xml
    ├── job_1400794299637_0010.summary
    ├── 
job_1400794299637_0011-1400808893300-tom-ParallelALSFactorizationJob%2DTransposeMapper%2DReduce-1400808924396-1-1-SUCCEEDED-default.jhist
    ├── job_1400794299637_0011_conf.xml
    ├── job_1400794299637_0011.summary
    ├── 
job_1400794299637_0012-1400808926898-tom-ParallelALSFactorizationJob%2DAverageRatingMapper%2DRe-1400808951099-1-1-SUCCEEDED-default.jhist
    ├── job_1400794299637_0012_conf.xml
    └── job_1400794299637_0012.summary
{noformat}

> JobHistoryServer (HistoryFileManager) needs more debug logs.
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-5902
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5902
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>            Reporter: jay vyas
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> With the JobHistory Server , it appears that its possible sometimes to skip 
> over certain history files.  I havent been able to determine why yet, but 
> I've found that some long named .jhist files aren't getting collected into 
> the done/ directory.
> After tracing some in the actual source, and turning on DEBUG level logging, 
> it became clear that this snippet is an important workhorse 
> (scanDirectoryForIntermediateFiles, and scanDirectoryForHistoryFiles 
> ultimately boil down to scanDirectory()).  
> It would be extremely useful , then, to have a couple of gaurded logs at this 
> level of the code, so that we can see, in the log folders, why files are 
> being filtered out  , i.e. it is due to filterint or visibility.
> {noformat}
>   private static List<FileStatus> scanDirectory(Path path, FileContext fc,
>       PathFilter pathFilter) throws IOException {
>     path = fc.makeQualified(path);
>     List<FileStatus> jhStatusList = new ArrayList<FileStatus>();
>     RemoteIterator<FileStatus> fileStatusIter = fc.listStatus(path);
>     while (fileStatusIter.hasNext()) {
>       FileStatus fileStatus = fileStatusIter.next();
>       Path filePath = fileStatus.getPath();
>       if (fileStatus.isFile() && pathFilter.accept(filePath)) {
>         jhStatusList.add(fileStatus);
>       }
>     }
>     return jhStatusList;
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5902) JobHistoryServer (HistoryFileManager) needs more debug logs.

Reply via email to