[
https://issues.apache.org/jira/browse/BEAM-12164?focusedWorklogId=717725&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-717725
]
ASF GitHub Bot logged work on BEAM-12164:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 31/Jan/22 00:33
Start Date: 31/Jan/22 00:33
Worklog Time Spent: 10m
Work Description: thiagotnunes opened a new pull request #16655:
URL: https://github.com/apache/beam/pull/16655
The original algorithm of the DetectNewPartitions is susceptible to
failures, because it produces side effects on every try which is not
idempotent. Specifically, it marks the partitions as SCHEDULED in the Spanner
database and outputs them. If there is a bundle commit failure, during retry,
the already SCHEDULED partitions will not be picked up again.
We change the algorithm in this PR to always schedule partitions that have a
created at timestamp greater than the one saved in the DetectNewPartitions
restriction. When scheduling the partitions, this SDF will also claim the
created at of such partitions, advancing the timestamp saved. If there is a
bundle commit failure, the restriction timestamp won't be saved, thus the
partitions in the bundle will be picked up again regardless of their state.
More information can be seen at:
https://docs.google.com/document/d/1IQAOqLmGuIaOJc55NmfUckM4rDCXHAmxKNuRg6Ae07U/edit#heading=h.q3e0xrkg85ay
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 717725)
Time Spent: 8h 40m (was: 8.5h)
> SpannerIO Change Stream Connector
> ---------------------------------
>
> Key: BEAM-12164
> URL: https://issues.apache.org/jira/browse/BEAM-12164
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core
> Reporter: Thiago Nunes
> Priority: P3
> Time Spent: 8h 40m
> Remaining Estimate: 0h
>
> We would like to augment the existing Google Cloud SpannerIO connector
> ([https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java)]
> with the support for Spanner Change Streams (CDC). CDC support is just being
> implemented in Spanner and it will be exposed through a gRPC API. We will use
> such API to create a new SpannerIO.readChangeStream(...) implementation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)