[ 
https://issues.apache.org/jira/browse/HADOOP-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872630#comment-16872630
 ] 

Sean Mackrory commented on HADOOP-15729:
----------------------------------------

So I tried out some of the same 1 GB uploads in the same region. Could vary 
that more, but as it is I think I'm just seeing a lot of noise in the 
underlying performance. I *tended* to see that separating core thread and max 
thread counts resulted in worse performance - I mainly tested with 50 max 
threads, and any combination of 0 core threads, 10 core threads, prestarting or 
not, all *tended* to have very similar decreases in performance. I'm not sure 
how to explain that - except for the "tended" - once in a while with an 
identical configuration I'd see the same drop in performance (in the best case, 
upload takes ~25 seconds, when there was a drop in performance it was in the 
neighborhood of 35 - 45 seconds). So it'd be quite the task to do a 
statistically rigorous experiment on this...

So here's what I propose: what really shouldn't, and in my testing *tended* not 
to, have any impact on the short-term performance characteristics but would 
also completely solve the problem long-running processes have, is simply 
allowing core threads anywhere to time out. We already have a timeout 
configured, we just only use it for the BlockingThreadPoolExecutorService, and 
not for the unbounded threadpool we give to Amazon. I think this is the safe 
and right choice. Patch attached.

Side note: I'm seeing that I can't mv or cp on S3 because it says the 
destination exists when it doesn't, and I'm getting some Maven errors packaging 
stuff in S3AFileSystem because of JavaDoc errors (an errant <, and a missing 
param). I'll follow up on those in separate JIRAs.

> [s3a] stop treat fs.s3a.max.threads as the long-term minimum
> ------------------------------------------------------------
>
>                 Key: HADOOP-15729
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15729
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>            Priority: Major
>         Attachments: HADOOP-15729.001.patch, HADOOP-15729.002.patch
>
>
> A while ago the s3a connector started experiencing deadlocks because the AWS 
> SDK requires an unbounded threadpool. It places monitoring tasks on the work 
> queue before the tasks they wait on, so it's possible (has even happened with 
> larger-than-default threadpools) for the executor to become permanently 
> saturated and deadlock.
> So we started giving an unbounded threadpool executor to the SDK, and using a 
> bounded, blocking threadpool service for everything else S3A needs (although 
> currently that's only in the S3ABlockOutputStream). fs.s3a.max.threads then 
> only limits this threadpool, however we also specified fs.s3a.max.threads as 
> the number of core threads in the unbounded threadpool, which in hindsight is 
> pretty terrible.
> Currently those core threads do not timeout, so this is actually setting a 
> sort of minimum. Once that many tasks have been submitted, the threadpool 
> will be locked at that number until it bursts beyond that, but it will only 
> spin down that far. If fs.s3a.max.threads is set reasonably high and someone 
> uses a bunch of S3 buckets, they could easily have thousands of idle threads 
> constantly.
> We should either not use fs.s3a.max.threads for the corepool size and 
> introduce a new configuration, or we should simply allow core threads to 
> timeout. I'm reading the OpenJDK source now to see what subtle differences 
> there are between core threads and other threads if core threads can timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to