[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

Chris Nauroth (JIRA) Thu, 21 Jul 2016 12:43:20 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388286#comment-15388286
 ]


Chris Nauroth commented on HADOOP-11487:
----------------------------------------

Hello [~$iddhe$h].  The stack trace indicates a problem during a 
{{FileSystem#listStatus}} call.  The listing calls against S3 are subject to 
eventual consistency.  The goals of the S3Guard project, tracked in issue 
HADOOP-13345, would help address this scenario.  However, please note that this 
effort is targeted to the S3A file system, which is where our ongoing 
development effort on Hadoop S3 integration is happening.  (Your stack trace 
indicates you are currently using S3N.)

> FileNotFound on distcp to s3n/s3a due to creation inconsistency 
> ----------------------------------------------------------------
>
>                 Key: HADOOP-11487
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11487
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/s3
>    Affects Versions: 2.7.2
>            Reporter: Paulo Motta
>
> I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm 
> getting the following exception:
> {code:java}
> 2015-01-16 20:53:18,187 ERROR [main] 
> org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying 
> hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz
> java.io.FileNotFoundException: No such file or directory 
> 's3n://s3-bucket/file.gz'
>       at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>       at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>       at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>       at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.FileNotFoundException: No such file or 
> directory 's3n://s3-bucket/file.gz'
>       at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445)
>       at 
> org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187)
>       at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233)
>       at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> {code}
> However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. 
> So probably due to Amazon's S3 eventual consistency the job failure.
> In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus 
> must use fs.s3.maxRetries property in order to avoid failures like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-11487) FileNotFound on distcp to s3n/s3a due to creation inconsistency

Reply via email to