[jira] [Commented] (HADOOP-13991) Retry management in NativeS3FileSystem to avoid file upload problem

Steve Loughran (JIRA) Sun, 15 Jan 2017 08:06:46 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823174#comment-15823174
 ]


Steve Loughran commented on HADOOP-13991:
-----------------------------------------

Musaddique —thank your for your post and details on a fix.

I'm sorry to say we aren't going to take this. That's not because there's 
anything wrong with it, but because we've stopped doing any work on s3n other 
than any emergency security work, putting all our effort into S3a. Leaving s3n 
alone means that we have a reference s3 connector that is pretty much 
guaranteed not to have any regressions, while in s3a we can do more leading 
edge stuff.

S3a does have retry logic, a lot built into the Amazon S3 library itself, with 
some extra bits to deal with things that aren't retried that well (e.g. final 
commit of a multipart upload).

# please switch to s3a as soon as you can. If you are using Hadoop 2.7.3, its 
stable enough for use.
# and, if you want to improve s3a, please get involved on that code, ideally 
look at the work in HADOOP-11694 to see what to look forward to in Hadoop 2.8, 
and HADOOP-13204 to see the todo list where help is really welcome —and that 
includes help testing.

thanks,

> Retry management in NativeS3FileSystem to avoid file upload problem
> -------------------------------------------------------------------
>
>                 Key: HADOOP-13991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13991
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Musaddique Hossain
>            Priority: Minor
>
> NativeS3FileSystem does not support any retry management for failed uploading 
> to S3.
> If due to socket timeout or any other network exception, file uploading to S3 
> bucket fails, then uploading fails and temporary file gets deleted. 
> java.net.SocketException: Connection reset
>       at java.net.SocketInputStream.read(SocketInputStream.java:196)
>       at java.net.SocketInputStream.read(SocketInputStream.java:122)
>       at org.jets3t.service.S3Service.putObject(S3Service.java:2265)
>       at 
> org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.storeFile(Jets3tNativeFileSystemStore.java:122)
>       at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.fs.s3native.$Proxy8.storeFile(Unknown Source)
>       at 
> org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsOutputStream.close(NativeS3FileSystem.java:284)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
>       at 
> org.apache.hadoop.io.compress.bzip2.CBZip2OutputStream.close(CBZip2OutputStream.java:737)
>       at 
> org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionOutputStream.close(BZip2Codec.java:336)
>       at 
> org.apache.flume.sink.hdfs.HDFSCompressedDataStream.close(HDFSCompressedDataStream.java:155)
>       at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:312)
>       at org.apache.flume.sink.hdfs.BucketWriter$3.call(BucketWriter.java:308)
>       at 
> org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
>       at 
> org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
>       at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
> This can be solved by using asynchronous retry management.
> We have made following modifications to NativeS3FileSystem to add the retry 
> management, which is working fine in our product system, without any 
> uploading failure:
> {code:title=NativeS3FileSystem.java|borderStyle=solid}
> @@ -36,6 +36,7 @@
>  import java.util.Map;
>  import java.util.Set;
>  import java.util.TreeSet;
> +import java.util.concurrent.Callable;
>  import java.util.concurrent.TimeUnit;
>  import com.google.common.base.Preconditions;
> @@ -279,9 +280,19 @@
>        backupStream.close();
>        LOG.info("OutputStream for key '{}' closed. Now beginning upload", 
> key);
> +      Callable<Void> task = new Callable<Void>() {
> +         private final byte[] md5Hash = digest == null ? null : 
> digest.digest();
> +         public Void call() throws IOException {
> +            store.storeFile(key, backupFile, md5Hash);
> +            return null;
> +         }
> +      };
> +      RetriableTask<Void> r = new RetriableTask<Void>(task);
> +      
>        try {
> -        byte[] md5Hash = digest == null ? null : digest.digest();
> -        store.storeFile(key, backupFile, md5Hash);
> +         r.call();
> +      } catch (Exception e) {
> +         throw new IOException(e);
>        } finally {
>          if (!backupFile.delete()) {
>            LOG.warn("Could not delete temporary s3n file: " + backupFile);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13991) Retry management in NativeS3FileSystem to avoid file upload problem

Reply via email to