[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1100:
---------------------------------

    Attachment: MAPREDUCE-1100-20091102.txt

Attaching a first patch.

Introducing the following configuration items:
 - Job Configuration:
    -- {{JobContext.MAP_USERLOG_LIMIT}} : Per task limit on how much each log 
file can grow to. Used by {{killRunningTasksOverLimit()}} for killing tasks 
that write excessive logging.
    -- {{JobContext.REDUCE_USERLOG_LIMIT}} : Same as above for reduces.
    -- {{JobContext.MAP_USERLOG_RETAIN_SIZE}} : Per task configuration of how 
much tail of the each log file has to be retained. Each task-log file is 
truncated to this amount after the task finishes. Used by 
{{truncateLogsOfFinishedTasks()}}
    -- {{JobContext.REDUCE_USERLOG_RETAIN_SIZE}} : Same as above for reduces.

 - TT configuration
    -- {{TTConfig.TT_USERLOG_RETAIN_HOURS}} : TT configuraton of how long logs 
of each finished task has to be retained on this TT. Used by 
{{retireOldLogs()}} to cleanup very old logs.
    -- {{TTConfig.TT_USERLOG_CUMULATIVE_LIMIT}} : TT configuration limiting the 
total usage of log files across all tasks. If the total usage grows beyond this 
limit, {{removeOldFilesToControlCumulativeUsage()}} removes old log files 
irrespective of their age w.r.t {{TTConfig.TT_USERLOG_RETAIN_HOURS}}.

Moved clean-up of task-logs from child into TaskLogsMonitor which does the 
following:
{code}
while(true) {

  retireOldLogs(); // remove very old logs

  truncateLogsOfFinishedTasks(); // truncate finished tasks' logs. Also set 
no-writable permissions.

  killRunningTasksOverLimit(); // kill tasks going over per-task per-file limit

  removeOldFilesToControlCumulativeUsage(); // remove very old logs if total 
usage is alarming irrespective of retain.hours
}  
{code}

> User's task-logs filling up local disks on the TaskTrackers
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-1100
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1100
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Vinod K V
>            Assignee: Vinod K V
>         Attachments: MAPREDUCE-1100-20091102.txt
>
>
> Some user's jobs are filling up TT disks by outrageous logging. 
> mapreduce.task.userlog.limit.kb is not enabled on the cluster. Disks are 
> getting filled up before task-log cleanup via 
> mapred.task.userlog.retain.hours can kick in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to