[ 
https://issues.apache.org/jira/browse/BEAM-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Chen reassigned BEAM-10475:
----------------------------------

    Assignee: Siyuan Chen

> GroupIntoBatches with Runner-determined Sharding
> ------------------------------------------------
>
>                 Key: BEAM-10475
>                 URL: https://issues.apache.org/jira/browse/BEAM-10475
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>            Reporter: Siyuan Chen
>            Assignee: Siyuan Chen
>            Priority: P2
>              Labels: GCP, performance
>
> [https://s.apache.org/sharded-group-into-batches|https://s.apache.org/sharded-group-into-batches__]
> Improve the existing Beam transform, GroupIntoBatches, to allow Dataflow 
> runner to choose different sharding strategies depending on how the data 
> needs to be grouped. The goal is to help with the situation where the 
> elements to process need to be co-located to reduce the overhead that would 
> otherwise be incurred per element, while not losing the ability to scale the 
> parallelism. The essential idea is to build a stateful DoFn with shardable 
> states.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to