[ 
https://issues.apache.org/jira/browse/BEAM-6723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774260#comment-16774260
 ] 

Robert Bradshaw commented on BEAM-6723:
---------------------------------------

Thanks for the bug report. In the meantime, try re-windowing to global windows 
(retrieving and storing the original window as part of the element if it's 
needed later before the rewindowing).

> Reshuffle in streaming pipeline does not work on dataflowrunner python
> ----------------------------------------------------------------------
>
>                 Key: BEAM-6723
>                 URL: https://issues.apache.org/jira/browse/BEAM-6723
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow, sdk-py-core
>            Reporter: Brecht Coghe
>            Priority: Major
>
> When using a Reshuffle after windowing, the dataflowrunner gives the 
> following error:
> "org.apache.beam.sdk.transforms.windowing.IntervalWindow cannot be cast to 
> org.apache.beam.sdk.transforms.windowing.GlobalWindow"
> This makes it impossible to prevent fusion of operations and to distribute 
> our workload nicely.
>  
> For more context, see following post where [~robertwb] linked this Jira board:
> [https://stackoverflow.com/questions/54764081/google-dataflow-streaming-pipeline-is-not-distributing-workload-over-several-wor]
> Quick summary: Our use case is actually video analytics. We want to use the 
> windowing to get small intervals of videos and the grouping to group per 
> video stream. The group by key is thus a game id. What comes out the window 
> and grouping is several windows within one game so with 1 game_id key and 
> this does not distribute over several workers. The proposed workaround would 
> be to add the window_id to the key after windowing, then reshuffle and then 
> the processing would be able to run in parallel per window in one 
> videostream. Can someone confirm this approach (if reshuffle would work).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to