[jira] [Commented] (HADOOP-13489) DistCp may incorrectly return success status when the underlying Job failed

Mingliang Liu (JIRA) Fri, 12 Aug 2016 17:56:59 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419745#comment-15419745
 ]


Mingliang Liu commented on HADOOP-13489:
----------------------------------------

Thanks for providing more details about the test.

There is no sign that {{DistCp}} per se is doing wrongly. Moreover, there is no 
obvious proof that downstream applications are exploring {{DistCp}} correctly. 
This generally makes the proposal and patch no applicable.

I'm a bit surprised by the fact DistCp was extended and its {{execute()}} 
method was overridden. A broader question is that, is {{DistCp}} designed for 
this? Regardless of this (as I'm not a DistCp/HBase expert), I think the bigger 
problem here is that {{BackupDistCp}} is flaky in that it swallows all the 
exceptions as follows. This breaks the {{DistCp#run()}} contract with 
{{DistCp#execute()}}.
{code:title=MapReduceBackupCopyService.java$BackupDistCp}
  class BackupDistCp extends DistCp {
    ...
    @Override
    public Job execute() throws Exception {
      try {
      ...
      } catch (Throwable t) {
        LOG.debug("distcp " + job.getJobID() + " encountered error", t);
      } finally {
      ...
{code}
The {{execute()}} method is invoked by the {{DistCp#run()}} method, which 
(again) checks the status of the job by exception handling. I can't imagine 
{{DistCP#run()}} is able to do anything if all the exceptions were already 
swallowed. This is the root cause of the mis-usage problem in HBase. As a 
result, we couldn't expect "Exception encountered " in the log.

I don't think HDFS is obliged to do anything here. Will close this JIRA if no 
objections in 24 hours.

> DistCp may incorrectly return success status when the underlying Job failed
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-13489
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13489
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>         Attachments: HADOOP-13489.v1.patch, HADOOP-13489.v2.patch, 
> HADOOP-13489.v3.patch, MapReduceBackupCopyService.java, 
> TestIncrementalBackup-output.txt, testIncrementalBackup-8-12.txt
>
>
> I was troubleshooting HBASE-14450 where at the end of BackupdistCp#execute(), 
> distcp job was marked unsuccessful (BackupdistCp is a wrapper of DistCp).
> Yet in IncrementalTableBackupProcedure#incrementalCopy(), the return value 
> from copyService.copy() was 0.
> Here is related code from DistCp:
> {code}
>     try {
>       execute();
>     } catch (InvalidInputException e) {
>       LOG.error("Invalid input: ", e);
>       return DistCpConstants.INVALID_ARGUMENT;
>     } catch (DuplicateFileException e) {
>       LOG.error("Duplicate files in input path: ", e);
>       return DistCpConstants.DUPLICATE_INPUT;
>     } catch (AclsNotSupportedException e) {
>       LOG.error("ACLs not supported on at least one file system: ", e);
>       return DistCpConstants.ACLS_NOT_SUPPORTED;
>     } catch (XAttrsNotSupportedException e) {
>       LOG.error("XAttrs not supported on at least one file system: ", e);
>       return DistCpConstants.XATTRS_NOT_SUPPORTED;
>     } catch (Exception e) {
>       LOG.error("Exception encountered ", e);
>       return DistCpConstants.UNKNOWN_ERROR;
>     }
>     return DistCpConstants.SUCCESS;
> {code}
> We don't check whether the Job returned by execute() was successful.
> Even if the Job fails, DistCpConstants.SUCCESS is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13489) DistCp may incorrectly return success status when the underlying Job failed

Reply via email to