[
https://issues.apache.org/jira/browse/BEAM-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731809#comment-16731809
]
Jozef Vilcek commented on BEAM-5865:
------------------------------------
After a bit more testing, I found out that performance degradation in my case
is somehow related to operator chaining. It seems like by removing GBK shuffle,
some more transforms were chained into the operator which read kafka partition
in my case and slowed down processing. I played with `.disableChaining()` and
`.slotSharingGroup()` to force not to chaining parts of graph and it did have
positive impact. I am not familiar on how Flink allocates CPU times and
buffering between operators and slot groups so can not fully reason about it.
I guess that if the feature of not doing a shuffle and allow to use "runner
auto generated key" to allow "map side GBK (or keyBy)" is considered to be
implemented, it should not be automatic but somehow chosen by the user. I am
interested to hear what do you think [~mxm] about it.
So most important for me is to even shard allocation to workers first, to get
balanced load on workers. As I write above, right now this can be achieved only
by generating very specific key to reverse engineer Flink's key assignment.
Could this be considered to be done by Beam?
> Auto sharding of streaming sinks in FlinkRunner
> -----------------------------------------------
>
> Key: BEAM-5865
> URL: https://issues.apache.org/jira/browse/BEAM-5865
> Project: Beam
> Issue Type: Improvement
> Components: runner-flink
> Reporter: Maximilian Michels
> Priority: Major
>
> The Flink Runner should do auto-sharding of streaming sinks, similar to
> BEAM-1438. That way, the user doesn't have to set shards manually which
> introduces additional shuffling and might cause skew in the distribution of
> data.
> As per discussion:
> https://lists.apache.org/thread.html/7b92145dd9ae68da1866f1047445479f51d31f103d6407316bb4114c@%3Cuser.beam.apache.org%3E
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)