[ 
https://issues.apache.org/jira/browse/HADOOP-19205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855708#comment-17855708
 ] 

ASF GitHub Bot commented on HADOOP-19205:
-----------------------------------------

steveloughran opened a new pull request, #6892:
URL: https://github.com/apache/hadoop/pull/6892

   
   Adds new ClientManager interface/impl which provides on-demand creation of 
sync and async s3 clients, s3 transfer manager, and in close() terminates these.
   
   S3A FS is modified to
   * Create one of these and hand off to S3Store
   * Use the same ClientManager interface against S3Store to demand-create the 
services.
   * only create the async client as part of the transfer manager creation, 
during rename.
   * stats on client creation count/duration are recorded.
   + statistics on the time to initialize and shutdown the s3afs is collected 
in IOStatistics for reporting.
   
   No attempt to do async creation of the s3 client in initialize, though it 
could offer marginal benefits, depending on the codepath.
   
   Change-Id: I79a668aacd920048447485afed77df573a38cb37
   
   ### How was this patch tested?
   
   Relying on regression tests knowing that this codepath will be tested.
   
   Some other tests will be needed. e.g
   - verify recurrent creation always returns same instance.
   - behaviour after close()
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> S3A initialization/close slower than with v1 SDK
> ------------------------------------------------
>
>                 Key: HADOOP-19205
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19205
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Priority: Major
>         Attachments: Screenshot 2024-06-14 at 17.12.59.png, Screenshot 
> 2024-06-14 at 17.14.33.png
>
>
> Hive QE have observed slowdown in LLAP queries due to time to create and 
> close s3a filesystems instances. A key aspect of that is they keep closing 
> the fs instances (HIVE-27884), but looking at the profiles, the reason things 
> seem to have regressed is
> * two s3 clients are being created (sync and async)
> * these seem to take a lot of time scanning the classpath for "global 
> interceptors", which is at least an O(jars) operation; #of index entries in 
> the zip files may factor too.
> Proposed:
> * create async client on demand when the transfer manager is invoked
> * look at why passwords are being scanned for if 
> InstanceProfileCredentialsProvider is in use...that seems slow too
> SDK wishes
> * SDK maybe allow us to turn off that scan for interceptors?
> attaching screenshots of the profile. storediag snippet:
> {code}
> [001]  fs.s3a.access.key = (unset)
> [002]  fs.s3a.secret.key = (unset)
> [003]  fs.s3a.session.token = (unset)
> [004]  fs.s3a.server-side-encryption-algorithm = (unset)
> [005]  fs.s3a.server-side-encryption.key = (unset)
> [006]  fs.s3a.encryption.algorithm = (unset)
> [007]  fs.s3a.encryption.key = (unset)
> [008]  fs.s3a.aws.credentials.provider = 
> "com.amazonaws.auth.InstanceProfileCredentialsProvider" [core-site.xml]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to