[
https://issues.apache.org/jira/browse/BEAM-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siyuan Chen reassigned BEAM-10475:
----------------------------------
Assignee: Siyuan Chen
> GroupIntoBatches with Runner-determined Sharding
> ------------------------------------------------
>
> Key: BEAM-10475
> URL: https://issues.apache.org/jira/browse/BEAM-10475
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow
> Reporter: Siyuan Chen
> Assignee: Siyuan Chen
> Priority: P2
> Labels: GCP, performance
>
> [https://s.apache.org/sharded-group-into-batches|https://s.apache.org/sharded-group-into-batches__]
> Improve the existing Beam transform, GroupIntoBatches, to allow Dataflow
> runner to choose different sharding strategies depending on how the data
> needs to be grouped. The goal is to help with the situation where the
> elements to process need to be co-located to reduce the overhead that would
> otherwise be incurred per element, while not losing the ability to scale the
> parallelism. The essential idea is to build a stateful DoFn with shardable
> states.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)