[jira] [Updated] (HADOOP-16823) Large DeleteObject requests are their own Thundering Herd

Steve Loughran (Jira) Thu, 24 Jun 2021 01:29:23 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-16823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated HADOOP-16823:
------------------------------------
    Description: 
Currently AWS S3 throttling is initially handled in the AWS SDK, only reaching 
the S3 client code after it has given up.

This means we don't always directly observe when throttling is taking place.

Proposed:

* disable throttling retries in the AWS client library
* add a quantile for the S3 throttle events, as DDB has
* isolate counters of s3 and DDB throttle events to classify issues better

Because we are taking over the AWS retries, we will need to expand the initial 
delay en retries and the number of retries we should support before giving up.

Also: should we log throttling events? It could be useful but there is a risk 
of logs overloading especially if many threads in the same process were 
triggering the problem.

Proposed: log at debug.

Note: if S3 bucket logging is enabled then throttling events will be recorded 
as 503 responses in the logs. If the hadoop version contains the audit logging 
of HADOOP-17511, this can be used to identify operations/jobs/users which are 
triggering problems.

  was:
Currently AWS S3 throttling is initially handled in the AWS SDK, only reaching 
the S3 client code after it has given up.

This means we don't always directly observe when throttling is taking place.

Proposed:

* disable throttling retries in the AWS client library
* add a quantile for the S3 throttle events, as DDB has
* isolate counters of s3 and DDB throttle events to classify issues better

Because we are taking over the AWS retries, we will need to expand the initial 
delay en retries and the number of retries we should support before giving up.

Also: should we log throttling events? It could be useful but there is a risk 
of logs overloading especially if many threads in the same process were 
triggering the problem.

Proposed: log at debug.


> Large DeleteObject requests are their own Thundering Herd
> ---------------------------------------------------------
>
>                 Key: HADOOP-16823
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16823
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>             Fix For: 3.3.0
>
>
> Currently AWS S3 throttling is initially handled in the AWS SDK, only 
> reaching the S3 client code after it has given up.
> This means we don't always directly observe when throttling is taking place.
> Proposed:
> * disable throttling retries in the AWS client library
> * add a quantile for the S3 throttle events, as DDB has
> * isolate counters of s3 and DDB throttle events to classify issues better
> Because we are taking over the AWS retries, we will need to expand the 
> initial delay en retries and the number of retries we should support before 
> giving up.
> Also: should we log throttling events? It could be useful but there is a risk 
> of logs overloading especially if many threads in the same process were 
> triggering the problem.
> Proposed: log at debug.
> Note: if S3 bucket logging is enabled then throttling events will be recorded 
> as 503 responses in the logs. If the hadoop version contains the audit 
> logging of HADOOP-17511, this can be used to identify operations/jobs/users 
> which are triggering problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-16823) Large DeleteObject requests are their own Thundering Herd

Reply via email to