[
https://issues.apache.org/jira/browse/BEAM-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774359#comment-16774359
]
Lucas Bordwell commented on BEAM-6261:
--------------------------------------
Is there any update on this? We noticed that side inputs have started to work
on our Dataflow pipeline and have verified that the example referenced above
now updates as well. However, we have now been experiencing duplicate value
exceptions when the pipeline autoscales.
{code:java}
Caused by:
org.apache.beam.runners.dataflow.worker.repackaged.com.google.common.util.concurrent.UncheckedExecutionException:
java.lang.IllegalArgumentException: Duplicate values for SIDE_INPUT_VALUE
org.apache.beam.runners.dataflow.worker.repackaged.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2214)
org.apache.beam.runners.dataflow.worker.repackaged.com.google.common.cache.LocalCache.get(LocalCache.java:4053)
org.apache.beam.runners.dataflow.worker.repackaged.com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4899)
org.apache.beam.runners.dataflow.worker.StateFetcher.fetchSideInput(StateFetcher.java:188)
org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.fetchSideInput(StreamingModeExecutionContext.java:287)
org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.access$500(StreamingModeExecutionContext.java:71)
org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext$StepContext.issueSideInputFetch(StreamingModeExecutionContext.java:633)
org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext$UserStepContext.issueSideInputFetch(StreamingModeExecutionContext.java:696)
org.apache.beam.runners.dataflow.worker.StreamingSideInputFetcher.getReadyWindows(StreamingSideInputFetcher.java:133)
org.apache.beam.runners.dataflow.worker.StreamingSideInputDoFnRunner.startBundle(StreamingSideInputDoFnRunner.java:54)
org.apache.beam.runners.dataflow.worker.SimpleParDoFn.reallyStartBundle(SimpleParDoFn.java:303)
org.apache.beam.runners.dataflow.worker.SimpleParDoFn.startBundle(SimpleParDoFn.java:228)
org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.start(ParDoOperation.java:37)
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1226)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:141)
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:965)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
{code}
> Dataflow runner does not refresh updated side inputs
> ----------------------------------------------------
>
> Key: BEAM-6261
> URL: https://issues.apache.org/jira/browse/BEAM-6261
> Project: Beam
> Issue Type: New Feature
> Components: runner-dataflow
> Reporter: Scott Wegner
> Assignee: Daniel Mills
> Priority: Major
> Labels: triaged
>
> See [this user@
> thread|https://lists.apache.org/thread.html/5eed0fc3beeb9f1c1fe4a623cbcad41cb15d0d80490cafb1f27e4577@%3Cuser.beam.apache.org%3E].
> The [Slowly-changing lookup
> cache|https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1]
> pattern described on the GCP blog uses a side input to feed lookup data to
> join with the main data input. However, the Dataflow runner doesn't update
> side inputs.
> Example pipeline here: https://github.com/lbordwell/sideinput
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)