[ 
https://issues.apache.org/jira/browse/BEAM-6349?focusedWorklogId=182981&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-182981
 ]

ASF GitHub Bot logged work on BEAM-6349:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Jan/19 10:18
            Start Date: 09/Jan/19 10:18
    Worklog Time Spent: 10m 
      Work Description: lgajowy commented on issue #7418: [BEAM-6349] Revert 
"Treat VarInt encoding as a Beam primitive encoding in Dataflo…
URL: https://github.com/apache/beam/pull/7418#issuecomment-452644955
 
 
   @CraigChambersG after some further investigation we understand the issue now 
and it is not due to your changes. You were right about the old code - I just 
couldn't find the exact cause until I got help from people working closely with 
Dataflow. The reason was that you modified both worker code and beam SDK code. 
Worker code didn't get updated on dataflow when we run the tests without 
recompiling the worker and using the newest worker version. 
   
   More details here: https://issues.apache.org/jira/browse/BEAM-6349
   I also suggested a solution that avoids such problems: 
https://github.com/apache/beam/pull/7435
   
   Thank you for all your help.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 182981)
    Time Spent: 2h  (was: 1h 50m)

> Exceptions (IllegalArgumentException or NoClassDefFoundError) when running 
> tests on Dataflow runner
> ---------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-6349
>                 URL: https://issues.apache.org/jira/browse/BEAM-6349
>             Project: Beam
>          Issue Type: Improvement
>          Components: testing
>            Reporter: Lukasz Gajowy
>            Assignee: Craig Chambers
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Running GroupByKeyLoadTest results in the following error on Dataflow runner:
>  
> {code:java}
> java.lang.ExceptionInInitializerError
>       at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344)
>       at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338)
>       at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
>       at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
>       at 
> org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
>       at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120)
>       at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337)
>       at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291)
>       at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
>       at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
>       at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Multiple entries with same 
> key: 
> kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@39b69c48
>  and 
> kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@7966f294
>       at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136)
>       at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100)
>       at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:86)
>       at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:300)
>       at 
> org.apache.beam.runners.dataflow.util.CloudObjects.populateCloudObjectTranslators(CloudObjects.java:60)
>       at 
> org.apache.beam.runners.dataflow.util.CloudObjects.<clinit>(CloudObjects.java:39)
>       ... 15 more
> {code}
>  
> Example command to run the tests (FWIW, it also runs the  "clean" task 
> although I don't know if it's necessary):
> {code:java}
> ./gradlew clean :beam-sdks-java-load-tests:run --info 
> -PloadTest.mainClass=org.apache.beam.sdk.loadtests.GroupByKeyLoadTest 
> -Prunner=:beam-runners-google-cloud-dataflow-java 
> '-PloadTest.args=--sourceOptions={"numRecords":1000,"splitPointFrequencyRecords":1,"keySizeBytes":1,"valueSizeBytes":9,"numHotKeys":0,"hotKeyFraction":0,"seed":123456,"bundleSizeDistribution":{"type":"const","const":42},"forceNumInitialBundles":100,"progressShape":"LINEAR","initializeDelayDistribution":{"type":"const","const":42}}
>  
> --stepOptions={"outputRecordsPerInputRecord":1,"preservesInputKeyDistribution":true,"perBundleDelay":1,"perBundleDelayType":"MIXED","cpuUtilizationInMixedDelay":0.5}
>  --fanout=1 --iterations=1 --runner=DataflowRunner'{code}
>  
> After reverting commit bac909b8e237ef8a2ab7e17ac986e5cc90143e5b ([PR: 
> 7351|https://github.com/apache/beam/pull/7351]) I can no longer reproduce 
> this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to