[
https://issues.apache.org/jira/browse/BEAM-12164?focusedWorklogId=753166&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753166
]
ASF GitHub Bot logged work on BEAM-12164:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Apr/22 01:32
Start Date: 06/Apr/22 01:32
Worklog Time Spent: 10m
Work Description: hengfengli commented on code in PR #17200:
URL: https://github.com/apache/beam/pull/17200#discussion_r843386991
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/restriction/ThroughputEstimator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.spanner.changestreams.restriction;
+
+import com.google.cloud.Timestamp;
+import java.io.Serializable;
+import java.util.ArrayDeque;
+import java.util.Queue;
+
+/** An estimator to provide an estimate on the throughput of the outputted
elements. */
+public class ThroughputEstimator implements Serializable {
+
+ private static class Pair<K, V> {
+ private final K first;
+ private final V second;
+
+ public Pair(K first, V second) {
+ this.first = first;
+ this.second = second;
+ }
+
+ public K getFirst() {
+ return first;
+ }
+
+ public V getSecond() {
+ return second;
+ }
+
+ @Override
+ public String toString() {
+ return String.format("first: %s, second: %s", first, second);
+ }
+ }
+
+ private static final long serialVersionUID = -3597929310338724800L;
+ // The start time of each per-second window.
+ private Timestamp startTimeOfCurrentWindow;
+ // The bytes of the current window.
+ private double bytesInCurrentWindow;
+ // The number of seconds to look in the past.
+ private final int numOfSeconds = 60;
+ // The total bytes of all windows in the queue.
+ private double bytesInQueue;
+ // The queue holds a number of windows in the past in order to calculate
+ // a rolling windowing throughput.
+ private Queue<Pair<Timestamp, Double>> queue;
+
+ public ThroughputEstimator() {
+ queue = new ArrayDeque<>();
+ }
+
+ /**
+ * Updates the estimator with the bytes of records.
+ *
+ * @param timeOfRecords the committed timestamp of the records
+ * @param bytes the total bytes of the records
+ */
+ public void update(Timestamp timeOfRecords, double bytes) {
+ if (startTimeOfCurrentWindow == null) {
+ bytesInCurrentWindow = bytes;
+ startTimeOfCurrentWindow = timeOfRecords;
+ return;
+ }
+
+ if (timeOfRecords.getSeconds() < startTimeOfCurrentWindow.getSeconds() +
1) {
Review Comment:
You are asking `What if the timeOfRecords == startTimeOfCurrentWindow?`.
Did you miss the `+1` at the end? If timeOfRecords ==
startTimeOfCurrentWindow, `timeOfRecords.getSeconds() <
startTimeOfCurrentWindow.getSeconds() + 1` would be met and it will not go to
`else` branch.
Issue Time Tracking
-------------------
Worklog Id: (was: 753166)
Time Spent: 67h (was: 66h 50m)
> SpannerIO Change Stream Connector
> ---------------------------------
>
> Key: BEAM-12164
> URL: https://issues.apache.org/jira/browse/BEAM-12164
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core
> Reporter: Thiago Nunes
> Assignee: Thiago Nunes
> Priority: P2
> Fix For: 2.37.0
>
> Time Spent: 67h
> Remaining Estimate: 0h
>
> We would like to augment the existing Google Cloud SpannerIO connector
> ([https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java)]
> with the support for Spanner Change Streams (CDC). CDC support is just being
> implemented in Spanner and it will be exposed through a gRPC API. We will use
> such API to create a new SpannerIO.readChangeStream(...) implementation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)