[jira] [Commented] (HADOOP-13826) S3A Deadlock in multipart copy due to thread pool limits.

Sean Mackrory (JIRA) Wed, 23 Nov 2016 09:14:59 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15690733#comment-15690733
 ]


Sean Mackrory commented on HADOOP-13826:
----------------------------------------

{quote}tests are pretty raw, production code less so.{quote}

Yeah, not remotely proposing this for inclusion yet - just a proof of concept. 
As I increased the number of parallel renames, I start hitting deadlocks again. 
I had a threadpool dedicated entirely to ControlMonitor tasks, and once that 
filled up it was deadlock. I guess this is because my executor gets wrapped by 
other executors that have a single queue, and if the next item is a 
ControlMonitor task and the ControlMonitor task pool is filled, then we're back 
to square one. Rather than getting wrapped in 2 other types of executors (to 
add the listening and blocking behavior, respectively) I think to make this 
work we would have to bring that logic inside my S3TransferExecutor class so 
that all tasks were immediately segregated by type as soon as they were handed 
off from the AWS SDK.

I'll hold off until actually implementing that until there's more consensus on 
if that's even the right approach. My approach definitely increased the number 
of parallel operations you could get away with before hitting a deadlock, but 
until the entire executor chain does this it can't fix the core issue.

> S3A Deadlock in multipart copy due to thread pool limits.
> ---------------------------------------------------------
>
>                 Key: HADOOP-13826
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13826
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Sean Mackrory
>         Attachments: HADOOP-13826.001.patch, HADOOP-13826.002.patch
>
>
> In testing HIVE-15093 we have encountered deadlocks in the s3a connector. The 
> TransferManager javadocs 
> (http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html)
>  explain how this is possible:
> {quote}It is not recommended to use a single threaded executor or a thread 
> pool with a bounded work queue as control tasks may submit subtasks that 
> can't complete until all sub tasks complete. Using an incorrectly configured 
> thread pool may cause a deadlock (I.E. the work queue is filled with control 
> tasks that can't finish until subtasks complete but subtasks can't execute 
> because the queue is filled).{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13826) S3A Deadlock in multipart copy due to thread pool limits.

Reply via email to