[
https://issues.apache.org/jira/browse/BEAM-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774260#comment-16774260
]
Robert Bradshaw commented on BEAM-6723:
---------------------------------------
Thanks for the bug report. In the meantime, try re-windowing to global windows
(retrieving and storing the original window as part of the element if it's
needed later before the rewindowing).
> Reshuffle in streaming pipeline does not work on dataflowrunner python
> ----------------------------------------------------------------------
>
> Key: BEAM-6723
> URL: https://issues.apache.org/jira/browse/BEAM-6723
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow, sdk-py-core
> Reporter: Brecht Coghe
> Priority: Major
>
> When using a Reshuffle after windowing, the dataflowrunner gives the
> following error:
> "org.apache.beam.sdk.transforms.windowing.IntervalWindow cannot be cast to
> org.apache.beam.sdk.transforms.windowing.GlobalWindow"
> This makes it impossible to prevent fusion of operations and to distribute
> our workload nicely.
>
> For more context, see following post where [~robertwb] linked this Jira board:
> [https://stackoverflow.com/questions/54764081/google-dataflow-streaming-pipeline-is-not-distributing-workload-over-several-wor]
> Quick summary: Our use case is actually video analytics. We want to use the
> windowing to get small intervals of videos and the grouping to group per
> video stream. The group by key is thus a game id. What comes out the window
> and grouping is several windows within one game so with 1 game_id key and
> this does not distribute over several workers. The proposed workaround would
> be to add the window_id to the key after windowing, then reshuffle and then
> the processing would be able to run in parallel per window in one
> videostream. Can someone confirm this approach (if reshuffle would work).
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)