[jira] Updated: (HADOOP-2048) DISTCP mapper should report progress more often

Owen O'Malley (JIRA) Mon, 22 Oct 2007 14:27:44 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Owen O'Malley updated HADOOP-2048:
----------------------------------

    Status: Open  (was: Patch Available)

A couple of issues:
  1. Please use better/longer variable names.
  2. The failures shouldn't be stored, but always logged at the INFO level.
  3. I'd change the bfailed flag to failureCount and have the final exception 
record the number of failures.
  4. Don't bother doing a time limit on the status reporting. The framework 
already limits it down to once a second.
  5. Just use the status message to record # bytes copied, # files copied, # 
failures, since particular failures will be overwritten too quickly. You just 
want the user to know that there is something to look at in the logs.

> DISTCP mapper should report progress more often
> -----------------------------------------------
>
>                 Key: HADOOP-2048
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2048
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: Runping Qi
>            Assignee: Chris Douglas
>            Priority: Blocker
>             Fix For: 0.15.0
>
>         Attachments: 2048-2.patch, 2048-3.patch, 2048.patch
>
>
> When I ran DISTCP to copy files from one dfs to another, I noticed that some 
> mappers got killed due to failing to report status for 606 seconds. 
> I noticed that the mappers try to make a progress report for every 32MB 
> copied. A better way to ensure progress is to use a time interval since last 
> report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-2048) DISTCP mapper should report progress more often

Reply via email to