[
https://issues.apache.org/jira/browse/HADOOP-13826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15705101#comment-15705101
]
Steve Loughran commented on HADOOP-13826:
-----------------------------------------
they're kept consistent for a reason, not just because it simplifies
configuration in general.
S3 stores separate parts in separate files; you should get better performance
when reading parts separately; that is, for max speed you should set s3a block
size == upload partition size. By doing a copy with copy partition size ==
upload part size, we hope to preserve that performance on later reads. Who
knows, maybe it will even help copy performance.
what would be ideal would be to know the part size of an object; HADOOP-13261
proposed adding a custom header for this. However, time spent looking at split
calculation performance has convinced me that a new header would be useless
there; the overhead of querying the objects makes it too expensive. We could
start uploading it though, and maybe use it for a copy. still expensive though;
a 400mS HEAD would be about 2MB of copy bandwidth based on my (ad-hoc)
measurements of copy B/W of 6 MB/s
> S3A Deadlock in multipart copy due to thread pool limits.
> ---------------------------------------------------------
>
> Key: HADOOP-13826
> URL: https://issues.apache.org/jira/browse/HADOOP-13826
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.7.3
> Reporter: Sean Mackrory
> Attachments: HADOOP-13826.001.patch, HADOOP-13826.002.patch
>
>
> In testing HIVE-15093 we have encountered deadlocks in the s3a connector. The
> TransferManager javadocs
> (http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html)
> explain how this is possible:
> {quote}It is not recommended to use a single threaded executor or a thread
> pool with a bounded work queue as control tasks may submit subtasks that
> can't complete until all sub tasks complete. Using an incorrectly configured
> thread pool may cause a deadlock (I.E. the work queue is filled with control
> tasks that can't finish until subtasks complete but subtasks can't execute
> because the queue is filled).{quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]