[ 
https://issues.apache.org/jira/browse/BEAM-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186847#comment-17186847
 ] 

Siyuan Chen commented on BEAM-10475:
------------------------------------

BEAM-10703 tracks the progress in Dataflow runner to add runner-determined 
sharding support.

 

> GroupIntoBatches with Runner-determined Sharding
> ------------------------------------------------
>
>                 Key: BEAM-10475
>                 URL: https://issues.apache.org/jira/browse/BEAM-10475
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>            Reporter: Siyuan Chen
>            Assignee: Siyuan Chen
>            Priority: P2
>              Labels: GCP, performance
>
> [https://s.apache.org/sharded-group-into-batches|https://s.apache.org/sharded-group-into-batches__]
> Improve the existing Beam transform, GroupIntoBatches, to allow runners to 
> choose different sharding strategies depending on how the data needs to be 
> grouped. The goal is to help with the situation where the elements to process 
> need to be co-located to reduce the overhead that would otherwise be incurred 
> per element, while not losing the ability to scale the parallelism. The 
> essential idea is to build a stateful DoFn with shardable states.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to