thiagotnunes opened a new pull request #16655:
URL: https://github.com/apache/beam/pull/16655


   The original algorithm of the DetectNewPartitions is susceptible to 
failures, because it produces side effects on every try which is not  
idempotent. Specifically, it marks the partitions as SCHEDULED in the Spanner 
database and outputs them. If there is a bundle commit failure, during retry, 
the already SCHEDULED partitions will not be picked up again.
   
   We change the algorithm in this PR to always schedule partitions that have a 
created at timestamp greater than the one saved in the DetectNewPartitions 
restriction. When scheduling the partitions, this SDF will also claim the 
created at of such partitions, advancing the timestamp saved. If there is a 
bundle commit failure, the restriction timestamp won't be saved, thus the 
partitions in the bundle will be picked up again regardless of their state.
   
   More information can be seen at: 
https://docs.google.com/document/d/1IQAOqLmGuIaOJc55NmfUckM4rDCXHAmxKNuRg6Ae07U/edit#heading=h.q3e0xrkg85ay


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to