[ https://issues.apache.org/jira/browse/ASTERIXDB-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17398353#comment-17398353 ]
ASF subversion and git services commented on ASTERIXDB-2944: ------------------------------------------------------------ Commit 81c3249322957be261cd99bf7d6b464fcb4a3bbd in asterixdb's branch refs/heads/master from Wail Alkowaileet [ https://gitbox.apache.org/repos/asf?p=asterixdb.git;h=81c3249 ] [ASTERIXDB-2944][EXT] Ensure the size of S3 connection pool - user mode changes: no - storage format changes: no - interface changes: no Details: - We set the S3 connection pool size to be the number of partitions. Change-Id: I1e1ce66cc7cd39cc81d004f90c36871ad31a685f Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/12763 Integration-Tests: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Tested-by: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Reviewed-by: Wael Alkowaileet <wael....@gmail.com> Reviewed-by: Hussain Towaileb <hussai...@gmail.com> > "SdkClientException: Timeout waiting for connection from pool" when using > Parquet on S3 at large scale > ------------------------------------------------------------------------------------------------------ > > Key: ASTERIXDB-2944 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2944 > Project: Apache AsterixDB > Issue Type: Bug > Components: EXT - External data > Affects Versions: 0.9.8 > Reporter: Ingo Müller > Assignee: Wail Y. Alkowaileet > Priority: Major > > I am running complex queries against Parquet files on S3 (about 17GB) on a > large machine ({{m5d.24xlarge}} on EC2, which has 96 vCPUs) and get the > errors like the following: > {{java.io.InterruptedIOException: getFileStatus on > s3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable to > execute HTTP request: Timeout waiting for connection from pool}} > {{java.io.InterruptedIOException: Reopen at position 15899845068 > ons3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool}} > This seems to originate from the AWS SDK, where this error [may apparently > occur|https://github.com/aws/aws-sdk-java/issues/269] if (1) the S3Object is > not closed properly, or (2) too many requests are being made to the bucket. > The last time I tried, I found the request limit to S3 to be in the order of > 6k/s; is it possible that that limit is reached in my workload? > Let me know what kind of information you need to get to the bottom of the > problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)