[
https://issues.apache.org/jira/browse/MAPREDUCE-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prabhu Joseph updated MAPREDUCE-6981:
-------------------------------------
Attachment: clientlog
yarnlog
> Map Progress is misleading for Distcp job
> -----------------------------------------
>
> Key: MAPREDUCE-6981
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6981
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: distcp
> Affects Versions: 2.7.3
> Reporter: Prabhu Joseph
> Priority: Minor
> Attachments: clientlog, yarnlog
>
>
> The Progress displayed by client when running Distcp job is misleading. The
> Map Progress reaches 100% earlier than the map tasks finishes. The issue
> reproduced by just running Distcp with multiple huge files.
> JobImpl returns progress 1.0 when either task finishes or task progress is
> 1.0. The MapTask of Distcp gets the progress from SequenceFileRecordReader
> which looks like updates the progress after reading the list of files and
> which does not account the time taken to copy the files into Destination.
> {code}
> 17/10/11 13:33:29 INFO mapreduce.Job: map 100% reduce 0%
> 17/10/11 13:34:47 INFO mapreduce.Job: Job job_1506610341926_0016 completed
> successfully
> {code}
> The MapTask Progress is displayed at 17/10/11 13:33:29 whereas the last map
> task finishes at 2017-10-11 13:34:45
> {code}
> 2017-10-11 13:34:45,159 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
> task_1506610341926_0016_m_000002 Task Transitioned from RUNNING to SUCCEEDED
> {code}
> Attaching the client and application logs.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]