[ 
https://issues.apache.org/jira/browse/BEAM-12164?focusedWorklogId=753165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-753165
 ]

ASF GitHub Bot logged work on BEAM-12164:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Apr/22 01:17
            Start Date: 06/Apr/22 01:17
    Worklog Time Spent: 10m 
      Work Description: thiagotnunes commented on code in PR #17200:
URL: https://github.com/apache/beam/pull/17200#discussion_r843356034


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/restriction/TimestampUtils.java:
##########
@@ -30,6 +30,11 @@
       BigDecimal.valueOf(Timestamp.MIN_VALUE.getSeconds());
   private static final int NANOS_PER_SECOND = (int) 
TimeUnit.SECONDS.toNanos(1);
 
+  /** This interface is only used for replacing now() with a constant 
timestamp for testing. */
+  public interface TimeFunction {

Review Comment:
   nit: We could use a `java.util.Supplier<Timestamp>` instead of creating a 
new interface



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dofn/ReadChangeStreamPartitionDoFn.java:
##########
@@ -98,6 +102,7 @@ public ReadChangeStreamPartitionDoFn(
     this.mapperFactory = mapperFactory;
     this.actionFactory = actionFactory;
     this.metrics = metrics;
+    this.throughputEstimator = new ThroughputEstimator();

Review Comment:
   Can we inject the throughput estimator instead of creating it in the 
constructor? This makes it easier to test (injecting a mock, for instance)



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/restriction/ThroughputEstimator.java:
##########
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.spanner.changestreams.restriction;
+
+import com.google.cloud.Timestamp;
+import java.io.Serializable;
+import java.util.ArrayDeque;
+import java.util.Queue;
+
+/** An estimator to provide an estimate on the throughput of the outputted 
elements. */
+public class ThroughputEstimator implements Serializable {
+
+  private static class Pair<K, V> {
+    private final K first;
+    private final V second;
+
+    public Pair(K first, V second) {
+      this.first = first;
+      this.second = second;
+    }
+
+    public K getFirst() {
+      return first;
+    }
+
+    public V getSecond() {
+      return second;
+    }
+
+    @Override
+    public String toString() {
+      return String.format("first: %s, second: %s", first, second);
+    }
+  }
+
+  private static final long serialVersionUID = -3597929310338724800L;
+  // The start time of each per-second window.
+  private Timestamp startTimeOfCurrentWindow;
+  // The bytes of the current window.
+  private double bytesInCurrentWindow;
+  // The number of seconds to look in the past.
+  private final int numOfSeconds = 60;
+  // The total bytes of all windows in the queue.
+  private double bytesInQueue;
+  // The queue holds a number of windows in the past in order to calculate
+  // a rolling windowing throughput.
+  private Queue<Pair<Timestamp, Double>> queue;
+
+  public ThroughputEstimator() {
+    queue = new ArrayDeque<>();
+  }
+
+  /**
+   * Updates the estimator with the bytes of records.
+   *
+   * @param timeOfRecords the committed timestamp of the records
+   * @param bytes the total bytes of the records
+   */
+  public void update(Timestamp timeOfRecords, double bytes) {
+    if (startTimeOfCurrentWindow == null) {
+      bytesInCurrentWindow = bytes;
+      startTimeOfCurrentWindow = timeOfRecords;
+      return;
+    }
+
+    if (timeOfRecords.getSeconds() < startTimeOfCurrentWindow.getSeconds() + 
1) {

Review Comment:
   Sorry, I still am missing something here. I thought we would at to the 
current window by adding `bytesInCurrentWindow += bytes`, but if the seconds 
are equal I will go to the `else` branch adding a new pair into the Queue. Or 
am I getting this wrong?



##########
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/restriction/ThroughputEstimatorTest.java:
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.spanner.changestreams.restriction;
+
+import static org.hamcrest.Matchers.greaterThanOrEqualTo;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assume.assumeThat;
+
+import com.google.cloud.Timestamp;
+import com.pholser.junit.quickcheck.From;
+import com.pholser.junit.quickcheck.Property;
+import 
org.apache.beam.sdk.io.gcp.spanner.changestreams.util.TimestampGenerator;
+
+public class ThroughputEstimatorTest {
+  private static final double DELTA = 1e-10;
+
+  @Property
+  public void testThroughputCalculation(@From(TimestampGenerator.class) 
Timestamp startTimestamp) {

Review Comment:
   I think this specific test does not need to be a property based case, but we 
could create other test cases, such as:
   
   1. Generate random pairs of timestamp / bytes within a 60 seconds window and 
verify if the bytes are correct
   2. Generate random pairs of timestamp / bytes within a 300 seconds window 
and verify the sum at boundaries
   
   Here is an example of how I would do #1:
   
   ```java
   @Property
     public void testThroughputIsAccumulatedWithin60SecondsWindow(
         List<@InRange(minLong=0L, maxLong = 60000000L) Long> micros,
         List<Double> bytes
     ) {
       assumeThat(micros.size(), equalTo(bytes.size()));
       micros.sort(Long::compareTo);
       final List<Timestamp> timestamps = micros
           .stream()
           .map(Timestamp::ofTimeMicroseconds)
           .collect(Collectors.toList());
       final Double expectedSum = bytes.stream().reduce(Double::sum).get();
   
       for (int i = 0; i < timestamps.size(); i++) {
         estimator.update(timestamps.get(i), bytes.get(i))
       }
       double actualSum = estimator.getFrom(Timestamp
           .ofTimeSecondsAndNanos(timestamps.get(0).getSeconds() + 1, 
timestamps.get(0).getNanos()))
       assertEquals(expectedSum, actualSum, DELTA);
     }
   ```





Issue Time Tracking
-------------------

    Worklog Id:     (was: 753165)
    Time Spent: 66h 50m  (was: 66h 40m)

> SpannerIO Change Stream Connector
> ---------------------------------
>
>                 Key: BEAM-12164
>                 URL: https://issues.apache.org/jira/browse/BEAM-12164
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Thiago Nunes
>            Assignee: Thiago Nunes
>            Priority: P2
>             Fix For: 2.37.0
>
>          Time Spent: 66h 50m
>  Remaining Estimate: 0h
>
> We would like to augment the existing Google Cloud SpannerIO connector 
> ([https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java)]
>  with the support for Spanner Change Streams (CDC). CDC support is just being 
> implemented in Spanner and it will be exposed through a gRPC API. We will use 
> such API to create a new SpannerIO.readChangeStream(...) implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to