adnanhemani commented on PR #6507: URL: https://github.com/apache/hadoop/pull/6507#issuecomment-1920244731
Hi @steveloughran - thanks for your time on this. We appreciate it a lot. I've responded to your comments inline below: > We cannot have any more unshaded aws sdk jars as required on our classpath; removing s3 select in https://github.com/apache/hadoop/pull/6144 has simplified our life by removing another optional one. >Do you have a timetable for incorporating this plugging into bundle.jar? >Otherwise, it is critical that if the jar is not on the cross path normal S3 clients can be constructed and used. Totally agreed with you on this. @jxhan3 and I will work to make sure this is optional as we do not have any timeline (or known plans) for getting the plugin incorporated into the AWS Java SDKv2 bundle.jar. If this changes in the future, we'd be glad to reverse any classloading code we may need for now. >This will need documentation. Either in connecting.md or a new file in the same directory src/site/markdown/tools/hadoop-aws Noted. @jxhan3 will work on this in the next PR revision. >I do not see any integration tests. What is the story here? Is it possible to run the whole mvn verify test run with access grants? if so, adding a paragraph in testing.md would be good, and particular: how to set it up. I am particularly curious about how well the delegation tokens worked...are session credentials supported? The story currently is, if we treat the plugin as yet another third-party (and optional) dependency, then this PR is only going to be providing the bare minimum code for users to be able to enable the plugin if they explicitly choose to do so. Any issues with the actual functionality of the plugin should be addressed by the plugin itself at their [open source GitHub](https://github.com/aws/aws-s3-accessgrants-plugin-java-v2). So then, the only testing that we'd require would be to ensure that if users are explicitly enabling this feature, that S3A is ensuring its S3 clients have the plugin attached. Other open-source contributions (e.g. Iceberg) have accepted this testing model - and I'd also recommend it to reduce the need for redundant test coverage between S3A and the S3 Access Grants plugin. If you don't agree with this testing model, we can surely try to add additional ITest cases that will both setup and tear down the S3 Access Grants instance, locations, and required grants (to be noted: S3 Access Grants APIs are _not_ free) - and then test both when users should and should not be able to receive access. However, running all existing test cases under this model will be a heavy task that will likely require lots of test case refactoring, as Access Grants are defined on a location-by-location basis. In order to test both when users should and should not have access for each test case will require both additional setup and test code to ensure that those situations can be adequately tested with multiple data locations. I'm not sure that the ROI on making such a large change will be there. Please let me know your thoughts on this. As for how the feature works, S3 Access Grants will authenticate the credentials to find the IAM user associated with it - then use that identity for the authorization before returning a new set of scoped credentials to actually access the data (in other words, the credentials that are inputted to the S3 client will not be the credentials used to actually access the data). The S3 Access Grants plugin is the mechanism that will do the entire credential vending process and using the vended credentials properly in any calls made from the attached S3 client. As such, session credentials and delegation tokens will work given that the credentials that are passed to the S3 client (using any mechanism) are valid and can be authenticated properly. >The feature probably also needs an extra line in the "qualifying an SDK" section. Noted. @jxhan3 will work on this in the next PR revision. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
