jackye1995 commented on pull request #1573: URL: https://github.com/apache/iceberg/pull/1573#issuecomment-708513468
> > Has there been any consideration to using the AWS Java SDK v2? I know that @jacques-n mentioned they found the java v2 Async s3 SDK buggy, but the linked code from their project is using the AWS Java SDK v2 (all of the imports start with `software.amazon`). > > To me it seems like it would be smarter to start on the newer client version than have to do an upgrade later. My understanding is that the Java SDK V2 is much more performant for most things as its the one seeing most of the work. And though I don't doubt @jacques-n's performance / bug issues with the java sdk v2 async s3 client, but I would ask when that was? I've personally noticed that when new clients and new services are brought out by amazon, they're not always production ready from the start. But many times I've found that things we performance tested 6 months prior were much more performant / resilient later on. > > I would say that I've also had issues when exploring the v2 sdk, but more in terms of completeness of the implementation. For example they don't have transfer manager (not that we're using here), but if we decide to go that route, we would need to go back to v1. Also, at this point most other systems (Spark, S3A, Presto, etc.) are still on v1 as well. If there are documented performance or other features in v2, I'd be happy to upgrade, but it seems like the community hasn't really moved that direction yet. The SDK v2 is intended to live together with v1 because some old packages such as S3AFileSystem might never upgrade. That is why they have completely different class path and you do not need to resolve any dependency conflicts. All the new features related to the client itself will only be developed in v2, so it is always recommended to use the v2 client when possible for new projects. There is a [blog](https://aws.amazon.com/blogs/developer/tag/aws-sdk-java-v2/) that is dedicated to new features added to v2. For performance, there are optimizations made for users in AWS Lambda environment based on [this doc](https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/client-configuration-starttime.html). There is no performance benchmark done for HTTP calls, but since v2 supports HTTP2, it is supposed to be faster when the service enables HTTP2 traffic. From feature perspective, yes the transfer manager is not there, but for Iceberg the most important feature should be the multipart upload which is there, so I see much more benefits to use v2 instead of v1. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
