[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded

Thomas (JIRA) Thu, 13 Jul 2017 22:15:52 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Thomas updated HADOOP-14660:
----------------------------
    Attachment: HADOOP-14660-001.patch

Attaching HADOOP-14660-001.patch.  

Client-side throttling works as follows:

When *fs.azure.selfthrottling* is *false* and *fs.azure.autothrottling* is 
*true*, the feature is enabled,  The feature is not enabled by default.  When 
enabled, it listens to the SendRequestEvent and ErrorReceivingResponseEvent 
exposed by Azure Storage SDK.  In SendRequestEvent it will sleep, if necessary, 
to reduce errors caused by exceeding account ingress/egress limit and throttle 
throughput.  In ErrorReceivingResponseEvent, it will inspect the HTTP 
request/response and update metrics.  The metrics it calculates are "bytes 
successfully transferred", "bytes failed to transfer", "number of successful 
operations", and "number of failed operations".  It treats reads and writes 
separately, so there are actually two groups of metrics, one for read (GetBlob) 
and another for write (PutBlock, PutPage, and AppendBlock).

There is a timer that fires every 10 seconds.  The timer callback analyzes the 
metrics during the last 10 seconds and updates the "sleep duration" used in 
SendRequestEvent.  (There are actually two "sleep durations", one for reads and 
one for writes.)  To update the "sleep duration", the timer callback first 
calculates the error percentage:

Error Percentage = 100 * Bytes Failed / (Bytes Failed + Bytes Successful)

The sleep duration is then updated as follows:

if (Error Percentage < .1) {
  Sleep Duration = Sleep Duration * .975
} else if (Error Percentage < 1) {
  // Do nothing in attempt to stabilize. Less than 1% errors is acceptable.
} else {
  Additional Delay = (Bytes Failed + Bytes Successful) * 10 Seconds / Bytes 
Successful - 10 Seconds
  Sleep Duration = Additional Delay / (Operations Failed + Operations 
Successful)
}

The above describes the algorithm in a nutshell, omitting special handling (to 
avoid divide by zero, etc)

> wasb: improve throughput by 34% when account limit exceeded
> -----------------------------------------------------------
>
>                 Key: HADOOP-14660
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14660
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/azure
>            Reporter: Thomas
>            Assignee: Thomas
>         Attachments: HADOOP-14660-001.patch
>
>
> Big data workloads frequently exceed the Azure Storage max ingress and egress 
> limits 
> (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits).  
> For example, the max ingress limit for a GRS account in the United States is 
> currently 10 Gbps.  When the limit is exceeded, the Azure Storage service 
> fails a percentage of incoming requests, and this causes the client to 
> initiate the retry policy.  The retry policy delays requests by sleeping, but 
> the sleep duration is independent of the client throughput and account limit. 
>  This results in low throughput, due to the high number of failed requests 
> and thrashing causes by the retry policy.
> To fix this, we introduce a client-side throttle which minimizes failed 
> requests and maximizes throughput.  Tests have shown that this improves 
> throughtput by ~34% when the storage account max ingress and/or egress limits 
> are exceeded. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-14660) wasb: improve throughput by 34% when account limit exceeded

Reply via email to