Abacn commented on PR #26267:
URL: https://github.com/apache/beam/pull/26267#issuecomment-1507573236

   Verified that the unit test fails in current master and suceeds with fix in.
   
   Current master:
   ```
   Apr 13, 2023 4:25:34 PM 
org.apache.beam.runners.dataflow.worker.WorkerCustomSources performSplitTyped
   WARNING: Splitting source <unknown> into bundles of estimated size 67108864 
bytes produced 200 bundles, which have
   total serialized size 116265 bytes. As this is too large for the Google 
Cloud Dataflow API, retrying splitting once with
   increased desiredBundleSizeBytes 1560482414 to reduce the number of splits.
   Apr 13, 2023 4:25:34 PM 
org.apache.beam.runners.dataflow.worker.WorkerCustomSources performSplitTyped
   INFO: Splitting with desiredBundleSizeBytes 1560482414 produced 200 bundles 
with total serialized size 116265 bytes
   Apr 13, 2023 4:25:34 PM 
org.apache.beam.runners.dataflow.worker.WorkerCustomSources performSplitTyped
   WARNING: Splitting source <unknown> into bundles of estimated size 
1560482414 bytes produced 200 bundles. Rebundling into 100 bundles.
   Apr 13, 2023 4:25:34 PM 
org.apache.beam.runners.dataflow.worker.WorkerCustomSources performSplitTyped
   INFO: Splitting source <unknown> produced 100 bundles with total serialized 
response size 58815
   
   Total size of the BoundedSource objects generated by split() operation is 
larger than the allowable limit. When splitting
   <unknown> into bundles of 1560482414 bytes it generated 200 BoundedSource 
objects with total serialized size of 58815
   bytes which is larger than the limit 10000. For more information, please 
check the corresponding FAQ entry at
   https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline
   java.lang.IllegalArgumentException: Total size of the BoundedSource objects 
generated by split() operation is larger than the
   allowable limit. When splitting <unknown> into bundles of 1560482414 bytes 
it generated 200 BoundedSource objects with
   total serialized size of 58815 bytes which is larger than the limit 10000. 
For more information, please check the
   corresponding FAQ entry at 
https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline
        at 
org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:286)
        at 
org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:201)
        at 
org.apache.beam.runners.dataflow.worker.WorkerCustomSourcesTest.testSplittingProducedResponseUnderLimit(WorkerCustomSourcesTest.java:242)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:258)
        at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
        at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
        at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
        at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
        at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
        at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
        at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
        at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
        at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
        at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
        at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
        at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
        at 
org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176)
        at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
        at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
        at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
        at 
org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
        at 
org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:133)
        at 
org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:71)
        at 
worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
        at 
worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
   ```
   
   With the fix:
   ```
   Apr 13, 2023 4:27:05 PM 
org.apache.beam.runners.dataflow.worker.WorkerCustomSources performSplitTyped
   WARNING: Splitting source <unknown> into bundles of estimated size 67108864 
bytes produced 200 bundles, which have
   total serialized size 116265 bytes. As this is too large for the Google 
Cloud Dataflow API, retrying splitting once with
   increased desiredBundleSizeBytes 1560482414 to reduce the number of splits.
   Apr 13, 2023 4:27:05 PM 
org.apache.beam.runners.dataflow.worker.WorkerCustomSources performSplitTyped
   INFO: Splitting with desiredBundleSizeBytes 1560482414 produced 200 bundles 
with total serialized size 116265 bytes
   Apr 13, 2023 4:27:05 PM 
org.apache.beam.runners.dataflow.worker.WorkerCustomSources performSplitTyped
   WARNING: Re-bundle source <unknown> into bundles of estimated size 13330 
bytes produced 16 bundles.
   Apr 13, 2023 4:27:06 PM 
org.apache.beam.runners.dataflow.worker.WorkerCustomSources performSplitTyped
   WARNING: Re-bundle source <unknown> into bundles of estimated size 9814 
bytes produced 11 bundles.
   Apr 13, 2023 4:27:06 PM 
org.apache.beam.runners.dataflow.worker.WorkerCustomSources performSplitTyped
   INFO: Splitting source <unknown> produced 11 bundles with total serialized 
response size 9824
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to