BjornPrime opened a new issue, #28398: URL: https://github.com/apache/beam/issues/28398
### What would you like to happen? The new implementation of GCSIO (see PR #28079, if not already merged), contains several uses of GET requests to GCS when it is possible that simply instantiating the objects of interest would provide better performance, since GCS has some capability of figuring things out once a potent request (be it copy, delete, etc.) is made, based on bucket and blob names. See [here](https://github.com/googleapis/python-storage/issues/1112) for an example of how this can be done. Using this pattern was attempted during the initial migration to the GCS client, but floundered on concerns that confusing errors may be thrown in some instances. In particular, attempting to access a non-existent object through an operation involving a instantiated version sometimes threw permissions errors instead of 404s. To resolve this issue investigate this design pattern more thoroughly and see if the errors issues are real and can't be overcome. If they can be, swap out any unnecessary GETs for instantiations to improve performance. If they can't, connect with the GCS client team to share these concerns and see if they have any further info or workarounds. ### Issue Priority Priority: 2 (default / most feature requests should be filed as P2) ### Issue Components - [X] Component: Python SDK - [ ] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [X] Component: IO connector - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
