[ 
https://issues.apache.org/jira/browse/BEAM-14365?focusedWorklogId=762115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-762115
 ]

ASF GitHub Bot logged work on BEAM-14365:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Apr/22 05:02
            Start Date: 26/Apr/22 05:02
    Worklog Time Spent: 10m 
      Work Description: hengfengli commented on PR #17461:
URL: https://github.com/apache/beam/pull/17461#issuecomment-1109346515

   > Is it clear where the negative throughput value is coming from? and 
setting it to 0 is the right thing to do when it is negative?
   
   @y1chi No, we are still not sure where the negative number comes from since 
it only happened after running over 2 days. We got the following error: 
   
   "java.lang.IllegalArgumentException: Expected size >= 0 but received 
-36590.86666666666. at 
org.apache.beam.sdk.transforms.reflect.ByteBuddyDoFnInvokerFactory$DefaultGetSize.validateSize(ByteBuddyDoFnInvokerFactory.java:438)
 at 
org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn$DoFnInvoker.invokeGetSize(Unknown
 Source) at 
org.apache.beam.fn.harness.FnApiDoFnRunner.calculateRestrictionSize(FnApiDoFnRunner.java:1182)
 at 
org.apache.beam.fn.harness.FnApiDoFnRunner.trySplitForElementAndRestriction(FnApiDoFnRunner.java:1625)
 at 
org.apache.beam.fn.harness.FnApiDoFnRunner.access$1800(FnApiDoFnRunner.java:142)
 at 
org.apache.beam.fn.harness.FnApiDoFnRunner$SplittableFnDataReceiver.trySplit(FnApiDoFnRunner.java:1113)
 at 
org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$SplittingMetricTrackingFnDataReceiver.trySplit(PCollectionConsumerRegistry.java:342)
 at 
org.apache.beam.fn.harness.BeamFnDataReadRunner.trySplit(BeamFnDataReadRunner.java:259)
 at 
org.apache.beam.fn.harness.control.ProcessBundleHandler.trySplit(ProcessBundleHandler.java:688)
 at 
org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:151)
 at 
org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:116)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
 at java.base/java.lang.Thread.run(Thread.java:833) " 
   
   Basically, it comes from `getSize()` in `ReadChangeStreamPartitionDoFn`: 
   
   
https://github.com/apache/beam/blob/07f30d221e4b285b23b74c3509d77b62388b7bb4/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/dofn/ReadChangeStreamPartitionDoFn.java#L159-L172
   
   The remainingWork in Progress cannot be negative 
([code](https://github.com/apache/beam/blob/07f30d221e4b285b23b74c3509d77b62388b7bb4/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/splittabledofn/RestrictionTracker.java#L187-L192)).
 So the only possibility is from the `throughput`. 
   
   Setting it to 0 is the current effective way to fix the issue. Also, the 
throughput is not a critical part and does not impact the correctness of the 
connector. When the throughput is negative, setting it to 0 is acceptable. 
   
   In addition, I have been running a dataflow job with printing more logs to 
find out where the negative number actually comes from. 




Issue Time Tracking
-------------------

    Worklog Id:     (was: 762115)
    Time Spent: 20m  (was: 10m)

> [SpannerIO.readChangeStream] The reported backlog size should not be negative
> -----------------------------------------------------------------------------
>
>                 Key: BEAM-14365
>                 URL: https://issues.apache.org/jira/browse/BEAM-14365
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.39.0
>            Reporter: Hengfeng Li
>            Priority: P1
>             Fix For: 2.39.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> We got the following error when testing the Spanner change streams connector: 
> {code:java}
> "java.lang.IllegalArgumentException: Expected size >= 0 but received 
> -36590.86666666666. at 
> org.apache.beam.sdk.transforms.reflect.ByteBuddyDoFnInvokerFactory$DefaultGetSize.validateSize(ByteBuddyDoFnInvokerFactory.java:438)
>  at 
> org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn$DoFnInvoker.invokeGetSize(Unknown
>  Source) at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.calculateRestrictionSize(FnApiDoFnRunner.java:1182)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.trySplitForElementAndRestriction(FnApiDoFnRunner.java:1625)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner.access$1800(FnApiDoFnRunner.java:142)
>  at 
> org.apache.beam.fn.harness.FnApiDoFnRunner$SplittableFnDataReceiver.trySplit(FnApiDoFnRunner.java:1113)
>  at 
> org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$SplittingMetricTrackingFnDataReceiver.trySplit(PCollectionConsumerRegistry.java:342)
>  at 
> org.apache.beam.fn.harness.BeamFnDataReadRunner.trySplit(BeamFnDataReadRunner.java:259)
>  at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.trySplit(ProcessBundleHandler.java:688)
>  at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:151)
>  at 
> org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:116)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at java.base/java.lang.Thread.run(Thread.java:833) " {code}
> This indicates that the return value of getSize() in SDF is negative. Since 
> the fields in Progress cannot be negative, the negative number should come 
> from the local throughput estimator. 
>  
> When this error occurs, the pipeline gets stuck and no progress is made. This 
> prevents users from enabling autoscaling when using the SpannerIO change 
> streams functionality.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to