[
https://issues.apache.org/jira/browse/HADOOP-11684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thomas Demoor updated HADOOP-11684:
-----------------------------------
Attachment: HADOOP-11684-001.patch
001.patch
The threadpool is a mashup of the code that was in S3AFileSystem and the S4
code in the link in the description above.
This required bumping the AWS SDK dependency from version 1.7.4 to 1.7.8, where
the constructor for TransferManager can be passed any ExecutorService instead
of only a ThreadPool. However, I decided to upgrade the version to a very
recent version (1.9.27), as from 1.9 onwards, the different components of the
sdk can be imported individually. We now only include S3, resulting in much
smaller binaries.
BEWARE: To prove that the inclduded test fails when the rest of the patch is
not applied one has to uncomment a line as Constants.MAX_THREADS is removed
from the codebase by the patch.
As a side effect, the version upgrade also fixes following bug in
TransferManager: multiPartThreshold is now a long instead of an int.
> S3a to use thread pool that blocks clients
> ------------------------------------------
>
> Key: HADOOP-11684
> URL: https://issues.apache.org/jira/browse/HADOOP-11684
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.7.0
> Reporter: Thomas Demoor
> Assignee: Thomas Demoor
> Attachments: HADOOP-11684-001.patch
>
>
> Currently, if fs.s3a.max.total.tasks are queued and another (part)upload
> wants to start, a RejectedExecutionException is thrown.
> We should use a threadpool that blocks clients, nicely throtthling them,
> rather than throwing an exception. F.i. something similar to
> https://github.com/apache/incubator-s4/blob/master/subprojects/s4-comm/src/main/java/org/apache/s4/comm/staging/BlockingThreadPoolExecutorService.java
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)