nancyxu123 opened a new pull request, #21751:
URL: https://github.com/apache/beam/pull/21751

   Change the way we compute watermarks for the Spanner change stream connector.
   
   Previously, we computed watermarks by updating the watermark column in the 
Spanner metadata table continuously based on received data change records. 
Then, we would compute the minimum of all watermarks in the Spanner metadata 
table and output that as the watermark in the DetectNewPartitionsAction 
function.
   
   Now, we rely on the Dataflow watermarking mechanism to compute watermarks. 
For the ReadChangeStreamPartitionDoFn stage, we delay a DoFn instance from 
stopping until all child partitions have started running. This will prevent the 
watermark from jumping ahead in scenarios where the parent ends, but the child 
hasn't started yet, allowing the watermark to jump past the child's start time.
   
   This will have the following benefits:
   - Remove the use of BundleFinalizer in the code, allowing the connector to 
run on both Dataflow runner v1 and v2
   - Reduce the load on the Spanner metadata table, since we no longer need to 
continuously write to the watermark column.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to