[ https://issues.apache.org/jira/browse/HADOOP-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Wang reopened HADOOP-12319: ---------------------------------- > S3AFastOutputStream has no ability to apply backpressure > -------------------------------------------------------- > > Key: HADOOP-12319 > URL: https://issues.apache.org/jira/browse/HADOOP-12319 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 > Affects Versions: 2.7.0 > Reporter: Colin Marc > Priority: Critical > > Currently, users of S3AFastOutputStream can control memory usage with a few > settings: {{fs.s3a.threads.core,max}}, which control the number of active > uploads (specifically as arguments to a {{ThreadPoolExecutor}}), and > {{fs.s3a.max.total.tasks}}, which controls the size of the feeding queue for > the {{ThreadPoolExecutor}}. > However, a user can get an almost *guaranteed* crash if the throughput of the > writing job is higher than the total S3 throughput, because there is never > any backpressure or blocking on calls to {{write}}. > If {{fs.s3a.max.total.tasks}} is set high (the default is 1000), then > {{write}} calls will continue to add data to the queue, which can eventually > OOM. But if the user tries to set it lower, then writes will fail when the > queue is full; the {{ThreadPoolExecutor}} will reject the part with > {{java.util.concurrent.RejectedExecutionException}}. > Ideally, calls to {{write}} should *block, not fail* when the queue is full, > so as to apply backpressure on whatever the writing process is. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org