[ 
https://issues.apache.org/jira/browse/BEAM-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731809#comment-16731809
 ] 

Jozef Vilcek commented on BEAM-5865:
------------------------------------

After a bit more testing, I found out that performance degradation in my case 
is somehow related to operator chaining. It seems like by removing GBK shuffle, 
some more transforms were chained into the operator which read kafka partition 
in my case and slowed down processing. I played with `.disableChaining()` and 
`.slotSharingGroup()` to force not to chaining parts of graph and it did have 
positive impact. I am not familiar on how Flink allocates CPU times and 
buffering between operators and slot groups so can not fully reason about it.

I guess that if the feature of not doing a shuffle and allow to use "runner 
auto generated key" to allow "map side GBK (or keyBy)" is considered to be 
implemented, it should not be automatic but somehow chosen by the user. I am 
interested to hear what do you think [~mxm] about it.

 

So most important for me is to even shard allocation to workers first, to get 
balanced load on workers. As I write above, right now this can be achieved only 
by generating very specific key to reverse engineer Flink's key assignment. 
Could this be considered to be done by Beam?  

> Auto sharding of streaming sinks in FlinkRunner
> -----------------------------------------------
>
>                 Key: BEAM-5865
>                 URL: https://issues.apache.org/jira/browse/BEAM-5865
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-flink
>            Reporter: Maximilian Michels
>            Priority: Major
>
> The Flink Runner should do auto-sharding of streaming sinks, similar to 
> BEAM-1438. That way, the user doesn't have to set shards manually which 
> introduces additional shuffling and might cause skew in the distribution of 
> data.
> As per discussion: 
> https://lists.apache.org/thread.html/7b92145dd9ae68da1866f1047445479f51d31f103d6407316bb4114c@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to