[jira] [Commented] (HADOOP-17763) DistCp job fails when AM is killed

Bilwa S T (Jira) Sun, 27 Jun 2021 04:17:05 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-17763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17370203#comment-17370203
 ]


Bilwa S T commented on HADOOP-17763:
------------------------------------

sorry for the confusion. It doesn't delete entire staging directory. it deletes 
folder created inside staging directory as metafolder which gets deleted.
The folder which is set to DistCpConstants.CONF_LABEL_META_FOLDER gets deleted. 
{code:java}
private Path createMetaFolderPath() throws Exception {
    Configuration configuration = getConf();
    Path stagingDir = JobSubmissionFiles.getStagingDir(
            new Cluster(configuration), configuration);
    Path metaFolderPath = new Path(stagingDir, PREFIX + 
String.valueOf(rand.nextInt()));
    if (LOG.isDebugEnabled())
      LOG.debug("Meta folder location: " + metaFolderPath);
    configuration.set(DistCpConstants.CONF_LABEL_META_FOLDER, 
metaFolderPath.toString());    
    return metaFolderPath;
  }
{code}

This is not same for all mapreduce jobs. In case of other MR jobs only the 
output dir gets deleted on AM restart. I will attach a patch


> DistCp job fails when AM is killed
> ----------------------------------
>
>                 Key: HADOOP-17763
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17763
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Bilwa S T
>            Assignee: Bilwa S T
>            Priority: Major
>         Attachments: HADOOP-17763.001.patch
>
>
> Job fails as tasks fail with below exception
> {code:java}
> 2021-06-11 18:48:47,047 | ERROR | IPC Server handler 0 on 27101 | Task: 
> attempt_1623387358383_0006_m_000000_1000 - exited : 
> java.io.FileNotFoundException: File does not exist: 
> hdfs://hacluster/staging-dir/dsperf/.staging/_distcp-646531269/fileList.seq
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1637)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1630)
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1645)
>  at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1863)
>  at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1886)
>  at 
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:54)
>  at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:560)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:798)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$1.run(YarnChild.java:183)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:177)
>  | TaskAttemptListenerImpl.java:304{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-17763) DistCp job fails when AM is killed

Reply via email to