[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-6981:
-------------------------------------
    Attachment: clientlog
                yarnlog

> Map Progress is misleading for Distcp job
> -----------------------------------------
>
>                 Key: MAPREDUCE-6981
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6981
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 2.7.3
>            Reporter: Prabhu Joseph
>            Priority: Minor
>         Attachments: clientlog, yarnlog
>
>
> The Progress displayed by client when running Distcp job is misleading. The 
> Map Progress reaches 100% earlier than the map tasks finishes. The issue 
> reproduced by just running Distcp with multiple huge files. 
> JobImpl returns progress 1.0 when either task finishes or task progress is 
> 1.0. The MapTask of Distcp gets the progress from SequenceFileRecordReader 
> which looks like updates the progress after reading the list of files and 
> which does not account the time taken to copy the files into Destination.
> {code}
> 17/10/11 13:33:29 INFO mapreduce.Job:  map 100% reduce 0%
> 17/10/11 13:34:47 INFO mapreduce.Job: Job job_1506610341926_0016 completed 
> successfully
> {code}
> The MapTask Progress is displayed at 17/10/11 13:33:29 whereas the last map 
> task finishes at 2017-10-11 13:34:45
> {code}
> 2017-10-11 13:34:45,159 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1506610341926_0016_m_000002 Task Transitioned from RUNNING to SUCCEEDED
> {code}
> Attaching the client and application logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to