[
https://issues.apache.org/jira/browse/BEAM-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233165#comment-17233165
]
Boyuan Zhang commented on BEAM-10914:
-------------------------------------
Replied on the [email protected] thread. I'm going to summarize the issue
here as well.
The key problem is not in SDF but in how Dataflow expands Reshuffle. The
problem occurs when a pipeline is like:
Create1([a, b, c, d,]) -> Reshuffle ->
DoFn
Create2([element1, element2, ]) ->
A temporary work around is to insert a Reshuffle after Create2 as well.
> Splittable DoFn and Dataflow, "conflicting bucketing functions"
> ---------------------------------------------------------------
>
> Key: BEAM-10914
> URL: https://issues.apache.org/jira/browse/BEAM-10914
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Luke Cwik
> Assignee: Boyuan Zhang
> Priority: P2
> Labels: stale-P2
> Attachments: TsPointsStreamingFail.java
>
>
> When creating a pipeline with an SDF like:
> FileIO.match -> readMatches -> drive the upstream side input
> OR
> [unbounded watcher, sdf] -> [ParDo with side input from File.IO] -> [bounded
> sdf] -> [ParDo]
> Workflow failed. Causes: Step s22 has conflicting bucketing functions
> Attached to this bug is a repro supplied by the user
> (TsPointsStreamingFail.java)
> Source for reported issue:
> https://lists.apache.org/thread.html/r03c77ea03d7ff2678052bde412da19c4e13050652fd34d3e03a9e30f%40%3Cuser.beam.apache.org%3E
--
This message was sent by Atlassian Jira
(v8.3.4#803005)