[
https://issues.apache.org/jira/browse/BEAM-14365?focusedWorklogId=762115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762115
]
ASF GitHub Bot logged work on BEAM-14365:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Apr/22 05:02
Start Date: 26/Apr/22 05:02
Worklog Time Spent: 10m
Work Description: hengfengli commented on PR #17461:
URL: https://github.com/apache/beam/pull/17461#issuecomment-1109346515
> Is it clear where the negative throughput value is coming from? and
setting it to 0 is the right thing to do when it is negative?
@y1chi No, we are still not sure where the negative number comes from since
it only happened after running over 2 days. We got the following error:
"java.lang.IllegalArgumentException: Expected size >= 0 but received
-36590.86666666666. at
org.apache.beam.sdk.transforms.reflect.ByteBuddyDoFnInvokerFactory$DefaultGetSize.validateSize(ByteBuddyDoFnInvokerFactory.java:438)
at
org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn$DoFnInvoker.invokeGetSize(Unknown
Source) at
org.apache.beam.fn.harness.FnApiDoFnRunner.calculateRestrictionSize(FnApiDoFnRunner.java:1182)
at
org.apache.beam.fn.harness.FnApiDoFnRunner.trySplitForElementAndRestriction(FnApiDoFnRunner.java:1625)
at
org.apache.beam.fn.harness.FnApiDoFnRunner.access$1800(FnApiDoFnRunner.java:142)
at
org.apache.beam.fn.harness.FnApiDoFnRunner$SplittableFnDataReceiver.trySplit(FnApiDoFnRunner.java:1113)
at
org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$SplittingMetricTrackingFnDataReceiver.trySplit(PCollectionConsumerRegistry.java:342)
at
org.apache.beam.fn.harness.BeamFnDataReadRunner.trySplit(BeamFnDataReadRunner.java:259)
at
org.apache.beam.fn.harness.control.ProcessBundleHandler.trySplit(ProcessBundleHandler.java:688)
at
org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:151)
at
org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:116)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833) "
Basically, it comes from `getSize()` in `ReadChangeStreamPartitionDoFn`:
https://github.com/apache/beam/blob/07f30d221e4b285b23b74c3509d77b62388b7bb4/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dofn/ReadChangeStreamPartitionDoFn.java#L159-L172
The remainingWork in Progress cannot be negative
([code](https://github.com/apache/beam/blob/07f30d221e4b285b23b74c3509d77b62388b7bb4/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java#L187-L192)).
So the only possibility is from the `throughput`.
Setting it to 0 is the current effective way to fix the issue. Also, the
throughput is not a critical part and does not impact the correctness of the
connector. When the throughput is negative, setting it to 0 is acceptable.
In addition, I have been running a dataflow job with printing more logs to
find out where the negative number actually comes from.
Issue Time Tracking
-------------------
Worklog Id: (was: 762115)
Time Spent: 20m (was: 10m)
> [SpannerIO.readChangeStream] The reported backlog size should not be negative
> -----------------------------------------------------------------------------
>
> Key: BEAM-14365
> URL: https://issues.apache.org/jira/browse/BEAM-14365
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.39.0
> Reporter: Hengfeng Li
> Priority: P1
> Fix For: 2.39.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> We got the following error when testing the Spanner change streams connector:
> {code:java}
> "java.lang.IllegalArgumentException: Expected size >= 0 but received
> -36590.86666666666. at
> org.apache.beam.sdk.transforms.reflect.ByteBuddyDoFnInvokerFactory$DefaultGetSize.validateSize(ByteBuddyDoFnInvokerFactory.java:438)
> at
> org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn$DoFnInvoker.invokeGetSize(Unknown
> Source) at
> org.apache.beam.fn.harness.FnApiDoFnRunner.calculateRestrictionSize(FnApiDoFnRunner.java:1182)
> at
> org.apache.beam.fn.harness.FnApiDoFnRunner.trySplitForElementAndRestriction(FnApiDoFnRunner.java:1625)
> at
> org.apache.beam.fn.harness.FnApiDoFnRunner.access$1800(FnApiDoFnRunner.java:142)
> at
> org.apache.beam.fn.harness.FnApiDoFnRunner$SplittableFnDataReceiver.trySplit(FnApiDoFnRunner.java:1113)
> at
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$SplittingMetricTrackingFnDataReceiver.trySplit(PCollectionConsumerRegistry.java:342)
> at
> org.apache.beam.fn.harness.BeamFnDataReadRunner.trySplit(BeamFnDataReadRunner.java:259)
> at
> org.apache.beam.fn.harness.control.ProcessBundleHandler.trySplit(ProcessBundleHandler.java:688)
> at
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:151)
> at
> org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:116)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> at java.base/java.lang.Thread.run(Thread.java:833) " {code}
> This indicates that the return value of getSize() in SDF is negative. Since
> the fields in Progress cannot be negative, the negative number should come
> from the local throughput estimator.
>
> When this error occurs, the pipeline gets stuck and no progress is made. This
> prevents users from enabling autoscaling when using the SpannerIO change
> streams functionality.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)