[
https://issues.apache.org/jira/browse/HADOOP-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas updated HADOOP-14660:
----------------------------
Attachment: HADOOP-14660-003.patch
Attached HADOOP-14660-003.patch.
Thanks for the feedback Steve! My response below:
We tested this on an HDInsight cluster using real loads against Azure Storage
and used Storage Metrics
(https://docs.microsoft.com/en-us/rest/api/storageservices/Storage-Analytics-Metrics-Table-Schema)
to analyze the performance. For example, with the client-side throttling
feature enabled for an account with 10 Gbps ingress limit, the total ingress
throughput levels off at approximately 10 Gbps, whereas without the feature the
throughput varies between 5 Gbps and 10 Gbps as the service rejects requests
and the retry policy kicks in. For example, a teragen job that attempts to
upload more than the ingress limit allows can complete 30 to 40% faster with
the feature enabled. There can be tens or hundreds of wasb clients and they
independently throttle themselves with no knowledge of each other, and together
they minimize errors and maximize throughput.
I do not believe there will be any contention with TCP congestion control. I'm
confident, but not overly so, which is why the feature is disabled by default.
I'd like to obtain feedback and perhaps enable it by default in the future.
The Azure Storage service exposes account level metrics so we're not in the
dark today. My team plans to review and implement client-side metrics for
wasb. We're a small team, and metrics are important to get right, so we don't
want to do it piecemeal, but instead will review and make comprehensive changes.
This change will not compile without the 5.4.0 SDK, as it uses the new
ErrorReceivingResponseEvent, but I'll create a separate JIRA as you requested.
I agree the configuration key should have the suffix "enabled", but for
consistency I'm using "enable" because all the other wasb configuration keys
are this way.
ClientThrottlingInercept logs a debug message saying "Client-side throttling is
enabled for the WASB file system." There are no other configurable options for
the feature, so nothing else is included in the message.
The HTTP request/response parsing
(BlobOperationDescriptor.getContentLengthIfKnown) is best effort. It should
not throw or fail for the data that it processes. There are tests to validate
this parsing logic and its integration with the Storage SDK, but it is not a
general purpose parser--it works inconjunction with the SDK.
I switched to SLF4J.
Removed superfluous InterfaceAudience.Private.
Changed timer name to "wasb-timer-client-throttling-analyzer-<name>". The
timer task returns quickly--no blocking, waiting, loops, etc.
I updated ClientThrottlingAnalyzer, TestBlobOperationDescriptor, and
TestClientThrottlingAnalyzer as requeseted.
> wasb: improve throughput by 34% when account limit exceeded
> -----------------------------------------------------------
>
> Key: HADOOP-14660
> URL: https://issues.apache.org/jira/browse/HADOOP-14660
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/azure
> Reporter: Thomas
> Assignee: Thomas
> Attachments: HADOOP-14660-001.patch, HADOOP-14660-002.patch,
> HADOOP-14660-003.patch
>
>
> Big data workloads frequently exceed the Azure Storage max ingress and egress
> limits
> (https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits).
> For example, the max ingress limit for a GRS account in the United States is
> currently 10 Gbps. When the limit is exceeded, the Azure Storage service
> fails a percentage of incoming requests, and this causes the client to
> initiate the retry policy. The retry policy delays requests by sleeping, but
> the sleep duration is independent of the client throughput and account limit.
> This results in low throughput, due to the high number of failed requests
> and thrashing causes by the retry policy.
> To fix this, we introduce a client-side throttle which minimizes failed
> requests and maximizes throughput. Tests have shown that this improves
> throughtput by ~34% when the storage account max ingress and/or egress limits
> are exceeded.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]