[ 
https://issues.apache.org/jira/browse/HADOOP-16932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073825#comment-17073825
 ] 

Steve Loughran commented on HADOOP-16932:
-----------------------------------------

Caused by HADOOP-8143. Default -p option went from "none" to "blocksize", so 
distcp always does the getFileStatus call. Because, on incremental backup a 
HEAD call was made earlier, if there is only an interval of a few seconds 
between the two operations (i.e. small file) the 404 is still cached, and the 
second getFileStatus fails.

Fix will be to strip out blocksize and checksum options from the preservation 
flag list before deciding to issue the getFileStatus call based on the list not 
being empty.

> Distcp - Error: java.io.FileNotFoundException: No such file or directory
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-16932
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16932
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3, tools/distcp
>    Affects Versions: 3.0.0, 3.1.2, 3.2.1
>         Environment: Hadoop CDH 6.3
>            Reporter: Thangamani Murugasamy
>            Priority: Minor
>
> Distcp to AWS s3 was working fine on CDH 5.16 with distcp 2.6.0. but after 
> upgrade to CDH 6.3 which comes with distcp-3.0 JAR which is through error as 
> below.
> The same error with repeats on Hadoop-distcp-3.2.1.jar as well. Tried with 
> -direct option in 3.2.1, still same error.
>  
> Error: java.io.FileNotFoundException: No such file or directory: 
> s3a://XXXXXXXXXXXXX/part-00012-baa6a706-3816-4dfa-ba07-0fb56fd38178-c000.snappy.parquet
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2255)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
>  at org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:203)
>  at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:220)
>  at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:48)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to