[
https://issues.apache.org/jira/browse/HADOOP-16246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815043#comment-16815043
]
Greg Kinman commented on HADOOP-16246:
--------------------------------------
Oh, I have no indication that the task scheduling ordering has been rethought
(I haven't read the relevant code in the older version compared to latest) -
just that the preferred way of constructing the TransferManager has been
changed to using the builder. It's probably worth just refactoring to use the
builder and seeing how things go, seeing as these days we have your nifty
integration test that probably reproduces the deadlocking.
> Unbounded thread pool maximum pool size in S3AFileSystem TransferManager
> ------------------------------------------------------------------------
>
> Key: HADOOP-16246
> URL: https://issues.apache.org/jira/browse/HADOOP-16246
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Greg Kinman
> Priority: Major
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> I have something running in production that is running up on {{ulimit}}
> trying to create {{s3a-transfer-unbounded}} threads.
> Relevant background: https://issues.apache.org/jira/browse/HADOOP-13826.
> Before that change, the thread pool used in the {{TransferManager}} had both
> a reasonably small maximum pool size and work queue capacity.
> After that change, the thread pool has both a maximum pool size and work
> queue capacity of {{Integer.MAX_VALUE}}.
> This seems like a pretty bad idea, because now we have, practically speaking,
> no bound on the number of threads that might get created. I understand the
> change was made in response to experiencing deadlocks and at the warning of
> the documentation, which I will repeat here:
> {quote}It is not recommended to use a single threaded executor or a thread
> pool with a bounded work queue as control tasks may submit subtasks that
> can't complete until all sub tasks complete. Using an incorrectly configured
> thread pool may cause a deadlock (I.E. the work queue is filled with control
> tasks that can't finish until subtasks complete but subtasks can't execute
> because the queue is filled).
> {quote}
> The documentation only warns against having a bounded _work queue_, not
> against having a bounded _maximum pool size_. And this seems fine, as having
> an unbounded work queue sounds ok. Having an unbounded maximum pool size,
> however, does not.
> I will also note that this constructor is now deprecated and suggests using
> {{TransferManagerBuilder}} instead, which by default creates a fixed thread
> pool of size 10:
> [https://github.com/aws/aws-sdk-java/blob/1.11.534/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/TransferManagerUtils.java#L59].
> I suggest we make a small change here and keep the maximum pool size at
> {{maxThreads}}, which defaults to 10, while keeping the work queue as is
> (unbounded).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]