[ 
https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511173
 ] 

Owen O'Malley commented on HADOOP-1524:
---------------------------------------

I believe you are missing the point. The older splits are deleted to limit the 
size of the task logs. This means that you can't use their lengths to compute 
offsets because they aren't there any more. 

I'd propose an approach where the index file looks like:

file|offset

dropping the length, so that the index can be written when the new split is 
started. This will preserve the current functionality and fix the problem, I 
believe.

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  
> An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the 
> parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the 
> next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called 
> part-00000, part-00001, etc., the TaskLog.Reader can just look at all files 
> and arrange them by alphabetical order.  The split.idx file also contains 
> file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the 
> split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to