[
https://issues.apache.org/jira/browse/BEAM-12164?focusedWorklogId=750790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-750790
]
ASF GitHub Bot logged work on BEAM-12164:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 31/Mar/22 07:04
Start Date: 31/Mar/22 07:04
Worklog Time Spent: 10m
Work Description: hengfengli commented on a change in pull request #17200:
URL: https://github.com/apache/beam/pull/17200#discussion_r839249758
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dofn/ReadChangeStreamPartitionDoFn.java
##########
@@ -146,6 +156,17 @@ public TimestampRange initialRestriction(@Element
PartitionMetadata partition) {
return TimestampRange.of(startTimestamp, endTimestamp);
}
+ @GetSize
+ public double getSize(@Element PartitionMetadata partition, @Restriction
TimestampRange range)
+ throws Exception {
+ final BigDecimal timeGapInSeconds =
+ BigDecimal.valueOf(newTracker(partition,
range).getProgress().getWorkRemaining());
+ final BigDecimal throughput =
BigDecimal.valueOf(this.throughputEstimator.get());
+ LOG.debug(
+ "Reported getSize() - remaining work: " + timeGapInSeconds + "
throughput:" + throughput);
+ return timeGapInSeconds.multiply(throughput).doubleValue();
Review comment:
To make it easy for testing, I have to change to the following code:
```
@GetSize
public double getSize(@Element PartitionMetadata partition, @Restriction
TimestampRange range)
throws Exception {
return getBacklogSize(newTracker(partition,
range).getProgress().getWorkRemaining(), this.throughputEstimator.get());
}
@VisibleForTesting
private double getBacklogSize(double workRemaining, double
localThroughput) {
final BigDecimal timeGapInSeconds = BigDecimal.valueOf(workRemaining);
final BigDecimal throughput = BigDecimal.valueOf(localThroughput);
LOG.debug(
"Reported getSize() - remaining work: " + timeGapInSeconds + "
throughput:" + throughput);
return timeGapInSeconds.multiply(throughput).doubleValue();
}
```
It has a chance of an overflow, e.g., `throughput` is the max of double
although `timeGapInSeconds` should have a lower range (getSeconds() returns
long) but it will have an overflow. I wonder the property based tests would
have a meaning here because we are trying to test multiplying a double with a
long.
To avoid an overflow, we probably can change it to (avoid the overflow):
```
return
timeGapInSeconds.multiply(throughput).min(BigDecimal.valueOf(Double.MAX_VALUE)).doubleValue();
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 750790)
Time Spent: 66h 20m (was: 66h 10m)
> SpannerIO Change Stream Connector
> ---------------------------------
>
> Key: BEAM-12164
> URL: https://issues.apache.org/jira/browse/BEAM-12164
> Project: Beam
> Issue Type: New Feature
> Components: sdk-java-core
> Reporter: Thiago Nunes
> Assignee: Thiago Nunes
> Priority: P2
> Fix For: 2.37.0
>
> Time Spent: 66h 20m
> Remaining Estimate: 0h
>
> We would like to augment the existing Google Cloud SpannerIO connector
> ([https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java)]
> with the support for Spanner Change Streams (CDC). CDC support is just being
> implemented in Spanner and it will be exposed through a gRPC API. We will use
> such API to create a new SpannerIO.readChangeStream(...) implementation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)