[
https://issues.apache.org/jira/browse/BEAM-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701922#comment-16701922
]
Jozef Vilcek commented on BEAM-5865:
------------------------------------
Hm, I am not really clear on what "provide stable inputs" mean.
So in case reshuffle must be done, that how do we choose a good key and make
sure that key is placed on the worker we want. E.g. if we choose key = (1, 2,
..., num-workers) we want to in runtime have each worker be responsible for
exactly one key / shard. I do not know where this is decided in runner code.
Also, if we would want to use flink's subtask index for something, where it can
be reached? Transform replacements seems to operate entirely on Beam API but
that subtask index is Flink API.
> Auto sharding of streaming sinks in FlinkRunner
> -----------------------------------------------
>
> Key: BEAM-5865
> URL: https://issues.apache.org/jira/browse/BEAM-5865
> Project: Beam
> Issue Type: Improvement
> Components: runner-flink
> Reporter: Maximilian Michels
> Priority: Major
>
> The Flink Runner should do auto-sharding of streaming sinks, similar to
> BEAM-1438. That way, the user doesn't have to set shards manually which
> introduces additional shuffling and might cause skew in the distribution of
> data.
> As per discussion:
> https://lists.apache.org/thread.html/7b92145dd9ae68da1866f1047445479f51d31f103d6407316bb4114c@%3Cuser.beam.apache.org%3E
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)