[jira] [Commented] (ASTERIXDB-2944) "SdkClientException: Timeout waiting for connection from pool" when using Parquet on S3 at large scale

ASF subversion and git services (Jira) Thu, 12 Aug 2021 17:10:07 -0700


    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398353#comment-17398353
 ]


ASF subversion and git services commented on ASTERIXDB-2944:
------------------------------------------------------------

Commit 81c3249322957be261cd99bf7d6b464fcb4a3bbd in asterixdb's branch 
refs/heads/master from Wail Alkowaileet
[ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=81c3249 ]

[ASTERIXDB-2944][EXT] Ensure the size of S3 connection pool

- user mode changes: no
- storage format changes: no
- interface changes: no

Details:
- We set the S3 connection pool size to be the number of partitions.

Change-Id: I1e1ce66cc7cd39cc81d004f90c36871ad31a685f
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/12763
Integration-Tests: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Tested-by: Jenkins <jenk...@fulliautomatix.ics.uci.edu>
Reviewed-by: Wael Alkowaileet <wael....@gmail.com>
Reviewed-by: Hussain Towaileb <hussai...@gmail.com>


> "SdkClientException: Timeout waiting for connection from pool" when using 
> Parquet on S3 at large scale
> ------------------------------------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-2944
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2944
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: EXT - External data
>    Affects Versions: 0.9.8
>            Reporter: Ingo Müller
>            Assignee: Wail Y. Alkowaileet
>            Priority: Major
>
> I am running complex queries against Parquet files on S3 (about 17GB) on a 
> large machine ({{m5d.24xlarge}} on EC2, which has 96 vCPUs) and get the 
> errors like the following:
> {{java.io.InterruptedIOException: getFileStatus on 
> s3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable to 
> execute HTTP request: Timeout waiting for connection from pool}}
> {{java.io.InterruptedIOException: Reopen at position 15899845068 
> ons3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable 
> to execute HTTP request: Timeout waiting for connection from pool}}
> This seems to originate from the AWS SDK, where this error [may apparently 
> occur|https://github.com/aws/aws-sdk-java/issues/269] if (1) the S3Object is 
> not closed properly, or (2) too many requests are being made to the bucket. 
> The last time I tried, I found the request limit to S3 to be in the order of 
> 6k/s; is it possible that that limit is reached in my workload?
> Let me know what kind of information you need to get to the bottom of the 
> problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ASTERIXDB-2944) "SdkClientException: Timeout waiting for connection from pool" when using Parquet on S3 at large scale

Reply via email to