[
https://issues.apache.org/jira/browse/ASTERIXDB-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wail Y. Alkowaileet resolved ASTERIXDB-2944.
--------------------------------------------
Resolution: Fixed
> "SdkClientException: Timeout waiting for connection from pool" when using
> Parquet on S3 at large scale
> ------------------------------------------------------------------------------------------------------
>
> Key: ASTERIXDB-2944
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2944
> Project: Apache AsterixDB
> Issue Type: Bug
> Components: EXT - External data
> Affects Versions: 0.9.8
> Reporter: Ingo Müller
> Assignee: Wail Y. Alkowaileet
> Priority: Major
>
> I am running complex queries against Parquet files on S3 (about 17GB) on a
> large machine ({{m5d.24xlarge}} on EC2, which has 96 vCPUs) and get the
> errors like the following:
> {{java.io.InterruptedIOException: getFileStatus on
> s3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable to
> execute HTTP request: Timeout waiting for connection from pool}}
> {{java.io.InterruptedIOException: Reopen at position 15899845068
> ons3a://bucket/folder/file.parquet: com.amazonaws.SdkClientException: Unable
> to execute HTTP request: Timeout waiting for connection from pool}}
> This seems to originate from the AWS SDK, where this error [may apparently
> occur|https://github.com/aws/aws-sdk-java/issues/269] if (1) the S3Object is
> not closed properly, or (2) too many requests are being made to the bucket.
> The last time I tried, I found the request limit to S3 to be in the order of
> 6k/s; is it possible that that limit is reached in my workload?
> Let me know what kind of information you need to get to the bottom of the
> problem.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)