[
https://issues.apache.org/jira/browse/HADOOP-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823174#comment-15823174
]
Steve Loughran commented on HADOOP-13991:
-----------------------------------------
Musaddique —thank your for your post and details on a fix.
I'm sorry to say we aren't going to take this. That's not because there's
anything wrong with it, but because we've stopped doing any work on s3n other
than any emergency security work, putting all our effort into S3a. Leaving s3n
alone means that we have a reference s3 connector that is pretty much
guaranteed not to have any regressions, while in s3a we can do more leading
edge stuff.
S3a does have retry logic, a lot built into the Amazon S3 library itself, with
some extra bits to deal with things that aren't retried that well (e.g. final
commit of a multipart upload).
# please switch to s3a as soon as you can. If you are using Hadoop 2.7.3, its
stable enough for use.
# and, if you want to improve s3a, please get involved on that code, ideally
look at the work in HADOOP-11694 to see what to look forward to in Hadoop 2.8,
and HADOOP-13204 to see the todo list where help is really welcome —and that
includes help testing.
thanks,
> Retry management in NativeS3FileSystem to avoid file upload problem
> -------------------------------------------------------------------
>
> Key: HADOOP-13991
> URL: https://issues.apache.org/jira/browse/HADOOP-13991
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Affects Versions: 2.7.3
> Reporter: Musaddique Hossain
> Priority: Minor
>
> NativeS3FileSystem does not support any retry management for failed uploading
> to S3.
> If due to socket timeout or any other network exception, file uploading to S3
> bucket fails, then uploading fails and temporary file gets deleted.
> java.net.SocketException: Connection reset
> at java.net.SocketInputStream.read(SocketInputStream.java:196)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at org.jets3t.service.S3Service.putObject(S3Service.java:2265)
> at
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:122)
> at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.fs.s3native.$Proxy8.storeFile(Unknown Source)
> at
> org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close(NativeS3FileSystem.java:284)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
> at
> org.apache.hadoop.io.compress.bzip2.CBZip2OutputStream.close(CBZip2OutputStream.java:737)
> at
> org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionOutputStream.close(BZip2Codec.java:336)
> at
> org.apache.flume.sink.hdfs.HDFSCompressedDataStream.close(HDFSCompressedDataStream.java:155)
> at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:312)
> at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:308)
> at
> org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
> at
> org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
> at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
> This can be solved by using asynchronous retry management.
> We have made following modifications to NativeS3FileSystem to add the retry
> management, which is working fine in our product system, without any
> uploading failure:
> {code:title=NativeS3FileSystem.java|borderStyle=solid}
> @@ -36,6 +36,7 @@
> import java.util.Map;
> import java.util.Set;
> import java.util.TreeSet;
> +import java.util.concurrent.Callable;
> import java.util.concurrent.TimeUnit;
> import com.google.common.base.Preconditions;
> @@ -279,9 +280,19 @@
> backupStream.close();
> LOG.info("OutputStream for key '{}' closed. Now beginning upload",
> key);
> + Callable<Void> task = new Callable<Void>() {
> + private final byte[] md5Hash = digest == null ? null :
> digest.digest();
> + public Void call() throws IOException {
> + store.storeFile(key, backupFile, md5Hash);
> + return null;
> + }
> + };
> + RetriableTask<Void> r = new RetriableTask<Void>(task);
> +
> try {
> - byte[] md5Hash = digest == null ? null : digest.digest();
> - store.storeFile(key, backupFile, md5Hash);
> + r.call();
> + } catch (Exception e) {
> + throw new IOException(e);
> } finally {
> if (!backupFile.delete()) {
> LOG.warn("Could not delete temporary s3n file: " + backupFile);
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]