[
https://issues.apache.org/jira/browse/HADOOP-11697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354383#comment-14354383
]
Lei (Eddy) Xu commented on HADOOP-11697:
----------------------------------------
Hey, [~yzhangal]. Thanks a lot for your detailed reviews.
bq. The original setting in com.amazonaws.ClientConfiguration has the default
of 50 seconds:
We have several tests to run {{hadoop fs -put}} data from 300MB ~ 4GB in an EC2
instance with these default setups, and they constantly throw
{{SocketTimeoutExceptions}}:
{code}
2015-01-28 14:53:39,528 INFO [XXX] http.AmazonHttpClient
(AmazonHttpClient.java:executeHelper(448)) - Unable to execute HTTP request:
Read timed out
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
....
{code}
The main reason causes this timeout is renaming operation on S3.
Also I tested to {{fs -put}} a 4GB file, which takes about 5 minutes.
{code}
real 333.32
user 94.26
sys 65.08
{code}
The s3a upload parameters are not optimized yet. Note that this timeout is for
{code}
/**
* Sets the amount of time to wait (in milliseconds) for data to be
* transfered over an established, open connection before the connection
* times out and is closed. A value of 0 means infinity, and isn't
recommended.
*/
public void setSocketTimeout(int socketTimeout) {
{code}
That's the reason I think 30 minutes is an _aggressively_ safe timeout for
uploading 10-30GB files. In the future, after HDFS-7240 being committed, we can
avoid renaming {{\_COPYING_}} file in {{fs -put}}, this timeout can be much
smaller.
bq. usually the config names for timeout would contain a section to indicate
the time unit.
It is a very nice suggestion. Shall I file a following JIRA to address this?
Thanks!
> Use larger value for fs.s3a.connection.timeout.
> -----------------------------------------------
>
> Key: HADOOP-11697
> URL: https://issues.apache.org/jira/browse/HADOOP-11697
> Project: Hadoop Common
> Issue Type: Improvement
> Affects Versions: 2.6.0
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Priority: Minor
> Labels: s3
> Attachments: HADOOP-11697.001.patch, HDFS-7908.000.patch
>
>
> The default value of {{fs.s3a.connection.timeout}} is {{50000}} milliseconds.
> It causes many {{SocketTimeoutException}} when uploading large files using
> {{hadoop fs -put}}.
> Also, the units for {{fs.s3a.connection.timeout}} and
> {{fs.s3a.connection.estaablish.timeout}} are milliseconds. For s3
> connections, I think it is not necessary to have sub-seconds timeout value.
> Thus I suggest to change the time unit to seconds, to easy sys admin's job.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)