[
https://issues.apache.org/jira/browse/HADOOP-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-12319:
------------------------------------
Fix Version/s: 2.8.0
> S3AFastOutputStream has no ability to apply backpressure
> --------------------------------------------------------
>
> Key: HADOOP-12319
> URL: https://issues.apache.org/jira/browse/HADOOP-12319
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Affects Versions: 2.7.0
> Reporter: Colin Marc
> Priority: Critical
> Fix For: 2.8.0
>
>
> Currently, users of S3AFastOutputStream can control memory usage with a few
> settings: {{fs.s3a.threads.core,max}}, which control the number of active
> uploads (specifically as arguments to a {{ThreadPoolExecutor}}), and
> {{fs.s3a.max.total.tasks}}, which controls the size of the feeding queue for
> the {{ThreadPoolExecutor}}.
> However, a user can get an almost *guaranteed* crash if the throughput of the
> writing job is higher than the total S3 throughput, because there is never
> any backpressure or blocking on calls to {{write}}.
> If {{fs.s3a.max.total.tasks}} is set high (the default is 1000), then
> {{write}} calls will continue to add data to the queue, which can eventually
> OOM. But if the user tries to set it lower, then writes will fail when the
> queue is full; the {{ThreadPoolExecutor}} will reject the part with
> {{java.util.concurrent.RejectedExecutionException}}.
> Ideally, calls to {{write}} should *block, not fail* when the queue is full,
> so as to apply backpressure on whatever the writing process is.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]