[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated MAPREDUCE-7272:
-------------------------------------
    Description: 
{{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. One 
every call, the listener uses {{LOG.info()}} to printout the progress of the 
{{TaskAttempt}}.
{code:java}
taskAttemptStatus.progress = taskStatus.getProgress();
    LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
        + taskStatus.getProgress());
{code}
 
{code:bash}
2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_007783_0 is : 0.40713295
2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_020681_0 is : 0.55573714
2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_024371_0 is : 0.54190344
2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_033182_0 is : 0.50264555
2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_022375_0 is : 0.5495565
{code}
After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
thought that while it is helpful to have a log print of task progress, it is 
still excessive to log the progress in every update.
 This Jira is to suppress the excessive logging from TaskAttemptListener 
without affecting the frequency of progress updates. 
 There are two flags:
 * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the task 
progress will be logged every 10% of delta progress. Default is 5%.
 * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if the 
listener will log the progress every 3 minutes. This is helpful for long 
running tasks that take long time to achieve the delta threshold. Default is 1 
minute.

The listener will long whichever of {{delta.threshold}} and {{wait.delta.time}} 
is reached first. 
   Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override those 
two flags and log the task progress on every update.

  was:
{{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. One 
every call, the listener uses {{LOG.info()}} to printout the progress of the 
{{TaskAttempt}}.
{code:java}
taskAttemptStatus.progress = taskStatus.getProgress();
    LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
        + taskStatus.getProgress());
{code}
 
{code:bash}
2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_007783_0 is : 0.40713295
2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_020681_0 is : 0.55573714
2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_024371_0 is : 0.54190344
2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_033182_0 is : 0.50264555
2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1586003420099_716645_m_022375_0 is : 0.5495565
{code}
After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
thought that while it is helpful to have a log print of task progress, it is 
still excessive to log the progress in every update.
 This Jira is to suppress the excessive logging from TaskAttemptListener 
without affecting the frequency of progress updates. 
 There are two flags:
 * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the task 
progress will be logged every 10% of delta progress. Default is 5%.
 * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if the 
listener will log the progress every 3 minutes. This is helpful for long 
running tasks that take long time to achieve the delta threshold.

The listener will long whichever of {{delta.threshold}} and {{wait.delta.time}} 
is reached first. 
  Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override those 
two flags and log the task progress on every update.


> TaskAttemptListenerImpl excessive log messages
> ----------------------------------------------
>
>                 Key: MAPREDUCE-7272
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7272
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>         Attachments: MAPREDUCE-7272.001.patch
>
>
> {{TaskAttemptListenerImpl.statusUpdate()}} causes a bloating in log files. 
> One every call, the listener uses {{LOG.info()}} to printout the progress of 
> the {{TaskAttempt}}.
> {code:java}
> taskAttemptStatus.progress = taskStatus.getProgress();
>     LOG.info("Progress of TaskAttempt " + taskAttemptID + " is : "
>         + taskStatus.getProgress());
> {code}
>  
> {code:bash}
> 2020-04-07 10:20:50,708 INFO [IPC Server handler 17 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_007783_0 is : 0.40713295
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 7 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_020681_0 is : 0.55573714
> 2020-04-07 10:20:50,717 INFO [IPC Server handler 26 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_024371_0 is : 0.54190344
> 2020-04-07 10:20:50,738 INFO [IPC Server handler 15 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_033182_0 is : 0.50264555
> 2020-04-07 10:20:50,748 INFO [IPC Server handler 3 on 43926] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1586003420099_716645_m_022375_0 is : 0.5495565
> {code}
> After discussing this issue with [~nroberts], [~ebadger], and [~epayne], we 
> thought that while it is helpful to have a log print of task progress, it is 
> still excessive to log the progress in every update.
>  This Jira is to suppress the excessive logging from TaskAttemptListener 
> without affecting the frequency of progress updates. 
>  There are two flags:
>  * {{-Dmapreduce.task.progress.min.delta.threshold=0.10}}: means that the 
> task progress will be logged every 10% of delta progress. Default is 5%.
>  * {{-Dmapreduce.task.progress.wait.delta.time.threshold=3}}: means that if 
> the listener will log the progress every 3 minutes. This is helpful for long 
> running tasks that take long time to achieve the delta threshold. Default is 
> 1 minute.
> The listener will long whichever of {{delta.threshold}} and 
> {{wait.delta.time}} is reached first. 
>    Enabling {{LOG.DEBUG}} for  {{TaskAttemptListenerImpl}} will override 
> those two flags and log the task progress on every update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to