thiagotnunes opened a new pull request #16581: URL: https://github.com/apache/beam/pull/16581
Adds the SpannerIO.readChangeStreams feature that will enable users to consume a change stream from Cloud Spanner. This feature is under preview now, and can only be used for allowlisted customers. When reading a change stream the users will be able to operate on a PCollection of DataChangeRecords, containing the modifications made to the database as well as the type of operation. This PR contains the following modifications: 1. It exposes the `SpannerIO.readChangeStream(...)` functionality. Here we construct the internal pipeline composed of `Impulse -> Initialize -> DetectNewPartitions -> ReadChangeStreamPartition -> PostProcessingMetrics`. 2. It adds a DoFn to create the metadata table when the pipeline is deployed. This will be used to maintain internal state for the Connector. It can be seen in the `InitializeDoFn` class. 3. The `InitializeDoFn` uses a new class called `NameGenerator` that creates a random metadata table name to be used within the Connector. 4. It adds a DoFn to collect metrics as a final step in the Connector, the `PostProcessingMetricsDoFn`. This class will count the number of DataRecords produced, the stream time of a record from Cloud Spanner to the worker, and the total time that a record took from commit time in Spanner until it is emitted into the output PCollection. 5. Small bug fixes that changes the initial watermark timestamp from the current timestamp to the partition's start at timestamp. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
