BjornPrime opened a new issue, #28398:
URL: https://github.com/apache/beam/issues/28398

   ### What would you like to happen?
   
   The new implementation of GCSIO (see PR #28079, if not already merged), 
contains several uses of GET requests to GCS when it is possible that simply 
instantiating the objects of interest would provide better performance, since 
GCS has some capability of figuring things out once a potent request (be it 
copy, delete, etc.) is made, based on bucket and blob names.
   
   See [here](https://github.com/googleapis/python-storage/issues/1112) for an 
example of how this can be done.
   
   Using this pattern was attempted during the initial migration to the GCS 
client, but floundered on concerns that confusing errors may be thrown in some 
instances. In particular, attempting to access a non-existent object through an 
operation involving a instantiated version sometimes threw permissions errors 
instead of 404s. To resolve this issue investigate this design pattern more 
thoroughly and see if the errors issues are real and can't be overcome. If they 
can be, swap out any unnecessary GETs for instantiations to improve 
performance. If they can't, connect with the GCS client team to share these 
concerns and see if they have any further info or workarounds.
   
   ### Issue Priority
   
   Priority: 2 (default / most feature requests should be filed as P2)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [X] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to