[
https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas updated HADOOP-14660:
----------------------------
Attachment: HADOOP-14660-001.patch
Attaching HADOOP-14660-001.patch.
Client-side throttling works as follows:
When *fs.azure.selfthrottling* is *false* and *fs.azure.autothrottling* is
*true*, the feature is enabled, The feature is not enabled by default. When
enabled, it listens to the SendRequestEvent and ErrorReceivingResponseEvent
exposed by Azure Storage SDK. In SendRequestEvent it will sleep, if necessary,
to reduce errors caused by exceeding account ingress/egress limit and throttle
throughput. In ErrorReceivingResponseEvent, it will inspect the HTTP
request/response and update metrics. The metrics it calculates are "bytes
successfully transferred", "bytes failed to transfer", "number of successful
operations", and "number of failed operations". It treats reads and writes
separately, so there are actually two groups of metrics, one for read (GetBlob)
and another for write (PutBlock, PutPage, and AppendBlock).
There is a timer that fires every 10 seconds. The timer callback analyzes the
metrics during the last 10 seconds and updates the "sleep duration" used in
SendRequestEvent. (There are actually two "sleep durations", one for reads and
one for writes.) To update the "sleep duration", the timer callback first
calculates the error percentage:
Error Percentage = 100 * Bytes Failed / (Bytes Failed + Bytes Successful)
The sleep duration is then updated as follows:
if (Error Percentage < .1) {
Sleep Duration = Sleep Duration * .975
} else if (Error Percentage < 1) {
// Do nothing in attempt to stabilize. Less than 1% errors is acceptable.
} else {
Additional Delay = (Bytes Failed + Bytes Successful) * 10 Seconds / Bytes
Successful - 10 Seconds
Sleep Duration = Additional Delay / (Operations Failed + Operations
Successful)
}
The above describes the algorithm in a nutshell, omitting special handling (to
avoid divide by zero, etc)
> wasb: improve throughput by 34% when account limit exceeded
> -----------------------------------------------------------
>
> Key: HADOOP-14660
> URL: https://issues.apache.org/jira/browse/HADOOP-14660
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/azure
> Reporter: Thomas
> Assignee: Thomas
> Attachments: HADOOP-14660-001.patch
>
>
> Big data workloads frequently exceed the Azure Storage max ingress and egress
> limits
> (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits).
> For example, the max ingress limit for a GRS account in the United States is
> currently 10 Gbps. When the limit is exceeded, the Azure Storage service
> fails a percentage of incoming requests, and this causes the client to
> initiate the retry policy. The retry policy delays requests by sleeping, but
> the sleep duration is independent of the client throughput and account limit.
> This results in low throughput, due to the high number of failed requests
> and thrashing causes by the retry policy.
> To fix this, we introduce a client-side throttle which minimizes failed
> requests and maximizes throughput. Tests have shown that this improves
> throughtput by ~34% when the storage account max ingress and/or egress limits
> are exceeded.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]