[
https://issues.apache.org/jira/browse/BEAM-10475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186847#comment-17186847
]
Siyuan Chen commented on BEAM-10475:
------------------------------------
BEAM-10703 tracks the progress in Dataflow runner to add runner-determined
sharding support.
> GroupIntoBatches with Runner-determined Sharding
> ------------------------------------------------
>
> Key: BEAM-10475
> URL: https://issues.apache.org/jira/browse/BEAM-10475
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow
> Reporter: Siyuan Chen
> Assignee: Siyuan Chen
> Priority: P2
> Labels: GCP, performance
>
> [https://s.apache.org/sharded-group-into-batches|https://s.apache.org/sharded-group-into-batches__]
> Improve the existing Beam transform, GroupIntoBatches, to allow runners to
> choose different sharding strategies depending on how the data needs to be
> grouped. The goal is to help with the situation where the elements to process
> need to be co-located to reduce the overhead that would otherwise be incurred
> per element, while not losing the ability to scale the parallelism. The
> essential idea is to build a stateful DoFn with shardable states.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)