thiagotnunes opened a new pull request #16655: URL: https://github.com/apache/beam/pull/16655
The original algorithm of the DetectNewPartitions is susceptible to failures, because it produces side effects on every try which is not idempotent. Specifically, it marks the partitions as SCHEDULED in the Spanner database and outputs them. If there is a bundle commit failure, during retry, the already SCHEDULED partitions will not be picked up again. We change the algorithm in this PR to always schedule partitions that have a created at timestamp greater than the one saved in the DetectNewPartitions restriction. When scheduling the partitions, this SDF will also claim the created at of such partitions, advancing the timestamp saved. If there is a bundle commit failure, the restriction timestamp won't be saved, thus the partitions in the bundle will be picked up again regardless of their state. More information can be seen at: https://docs.google.com/document/d/1IQAOqLmGuIaOJc55NmfUckM4rDCXHAmxKNuRg6Ae07U/edit#heading=h.q3e0xrkg85ay -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
