[
https://issues.apache.org/jira/browse/HADOOP-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050575#comment-17050575
]
Andrew Olson edited comment on HADOOP-16900 at 3/3/20 9:39 PM:
---------------------------------------------------------------
[[email protected]] [~gabor.bota] I think we should fail fast here without
writing anything. The fail fast part already apparently happens as seen by
distcp task failures like this,
{noformat}
19/12/26 21:45:54 INFO mapreduce.Job: Task Id :
attempt_1576854935249_175694_m_000003_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://cluster/path/to/file.avro
--> s3a://bucket/path/to/file.avro
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:312)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:270)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying
hdfs://cluster/path/to/file.avro to s3a://bucket/path/to/file.avro
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:307)
... 10 more
Caused by: java.lang.IllegalArgumentException: partNumber must be between 1 and
10000 inclusive, but is 10001
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
at
org.apache.hadoop.fs.s3a.S3AFileSystem$WriteOperationHelper.newUploadPartRequest(S3AFileSystem.java:3086)
at
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.uploadBlockAsync(S3ABlockOutputStream.java:492)
at
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$000(S3ABlockOutputStream.java:469)
at
org.apache.hadoop.fs.s3a.S3ABlockOutputStream.uploadCurrentBlock(S3ABlockOutputStream.java:307)
at
org.apache.hadoop.fs.s3a.S3ABlockOutputStream.write(S3ABlockOutputStream.java:289)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:299)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:216)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:146)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:116)
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
{noformat}
However... in distcp there's an optimization where it skips files that already
exist in the destination (I think that match a checksum of some number of
initial bytes?), so when the task attempt was retried it found no work to do
for this large file, and the distcp job ultimately succeeded. I probably should
open a separate issue to have distcp compare the file lengths before deciding
that a path already present in the target location can be safely skipped.
We ran into this because we had tuned the fs.s3a.multipart.size down to 25M.
That effectively imposed a 250 GB limit on our S3A file writes due to the
10,000 part limit on the AWS side. Adjusting the multipart chunk size to a
larger value (100M) got us past this since all our files are under 1 TB.
Here's evidence of the issue we recorded noting the unexpected 289789841890 vs.
262144000000 bytes file size difference, with names changed to protect the
innocent.
{noformat}
$ hdfs dfs -ls /path/to/file.avro
-rwxrwxr-x 3 user group 289789841890 2016-12-20 06:37 /path/to/file.avro
$ hdfs dfs -conf conf.xml -ls s3a://bucket/path/to/file.avro
-rw-rw-rw- 1 user 262144000000 2019-12-26 21:45 s3a://bucket/path/to/file.avro
{noformat}
was (Author: noslowerdna):
[[email protected]] [~gabor.bota] I think we should fail fast here without
writing anything. The fail fast part already apparently happens as seen by
distcp task failures like this,
{noformat}
19/12/26 21:45:54 INFO mapreduce.Job: Task Id :
attempt_1576854935249_175694_m_000003_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://cluster/path/to/file.avro
--> s3a://bucket/path/to/file.avro
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:312)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:270)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying
hdfs://cluster/path/to/file.avro to s3a://bucket/path/to/file.avro
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:307)
... 10 more
Caused by: java.lang.IllegalArgumentException: partNumber must be between 1 and
10000 inclusive, but is 10001
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
at
org.apache.hadoop.fs.s3a.S3AFileSystem$WriteOperationHelper.newUploadPartRequest(S3AFileSystem.java:3086)
at
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.uploadBlockAsync(S3ABlockOutputStream.java:492)
at
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$000(S3ABlockOutputStream.java:469)
at
org.apache.hadoop.fs.s3a.S3ABlockOutputStream.uploadCurrentBlock(S3ABlockOutputStream.java:307)
at
org.apache.hadoop.fs.s3a.S3ABlockOutputStream.write(S3ABlockOutputStream.java:289)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:299)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:216)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:146)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:116)
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
{noformat}
However... in distcp there's an optimization where it skips files that already
exist in the destination (I think that match a checksum of some number of
initial bytes?), so when the task attempt was retried it found no work to do
for this large file, and the distcp job ultimately succeeded. I probably should
open a separate issue to have distcp compare the file lengths before deciding
that a path already present in the target location can be safely skipped.
We ran into this because we had tuned fs.s3a.multipart.size down to 25M. That
effectively imposed a 250 GB limit on our S3A file writes due to the 10,000
part limit on the AWS side. Adjusting the multipart chunk size to a larger
value (100M) got us past this since all our files are under 1 TB.
Here's evidence of the issue we recorded noting the unexpected 289789841890 vs.
262144000000 bytes file size difference, with names changed to protect the
innocent.
{noformat}
$ hdfs dfs -ls /path/to/file.avro
-rwxrwxr-x 3 user group 289789841890 2016-12-20 06:37 /path/to/file.avro
$ hdfs dfs -conf conf.xml -ls s3a://bucket/path/to/file.avro
-rw-rw-rw- 1 ao6517 262144000000 2019-12-26 21:45
s3a://bucket/path/to/file.avro
{noformat}
> Very large files can be truncated when written through S3AFileSystem
> --------------------------------------------------------------------
>
> Key: HADOOP-16900
> URL: https://issues.apache.org/jira/browse/HADOOP-16900
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.2.1
> Reporter: Andrew Olson
> Assignee: Steve Loughran
> Priority: Major
> Labels: s3
>
> If a written file size exceeds 10,000 * {{fs.s3a.multipart.size}}, a corrupt
> truncation of the S3 object will occur as the maximum number of parts in a
> multipart upload is 10,000 as specific by the S3 API and there is an apparent
> bug where this failure is not fatal, and the multipart upload is allowed to
> be marked as completed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]