lostluck opened a new issue, #25454: URL: https://github.com/apache/beam/issues/25454
### What would you like to happen? As implemented, the Python SDK Datastore IO Query doesn't currently retry on retryable RPC/HTTP errors, in particular, Deadline exceeded. Per the [Datastore documentation](https://cloud.google.com/datastore/docs/concepts/errors) DEADLINE_EXCEEDED errors should retry using exponential backoff. https://github.com/apache/beam/blob/v2.44.0/sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py#L304 Writes currently do this at least, but the same applies to reads. https://github.com/apache/beam/blob/v2.44.0/sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py#L397 ---- It does occur to me that this would need to be done in a safe enough fashion to not redundantly re-emit already read and processed data. This may complicate the implementation of this resilience improvement. ### Issue Priority Priority: 3 (nice-to-have improvement) ### Issue Components - [X] Component: Python SDK - [ ] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [X] Component: IO connector - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
