thiagotnunes opened a new pull request #16581:
URL: https://github.com/apache/beam/pull/16581


   Adds the SpannerIO.readChangeStreams feature that will enable users to 
consume a change stream from Cloud Spanner.
   This feature is under preview now, and can only be used for allowlisted 
customers.
   
   When reading a change stream the users will be able to operate on a 
PCollection of DataChangeRecords, containing the modifications made to the 
database as well as the type of operation.
   
   This PR contains the following modifications:
   
   1. It exposes the `SpannerIO.readChangeStream(...)` functionality. Here we 
construct the internal pipeline composed of `Impulse -> Initialize -> 
DetectNewPartitions -> ReadChangeStreamPartition -> PostProcessingMetrics`.
   2. It adds a DoFn to create the metadata table when the pipeline is 
deployed. This will be used to maintain internal state for the Connector. It 
can be seen in the `InitializeDoFn` class.
   3. The `InitializeDoFn` uses a new class called `NameGenerator` that creates 
a random metadata table name to be used within the Connector.
   4. It adds a DoFn to collect metrics as a final step in the Connector, the 
`PostProcessingMetricsDoFn`. This class will count the number of DataRecords 
produced, the stream time of a record from Cloud Spanner to the worker, and the 
total time that a record took from commit time in Spanner until it is emitted 
into the output PCollection.
   5. Small bug fixes that changes the initial watermark timestamp from the 
current timestamp to the partition's start at timestamp.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to