[jira] [Commented] (BEAM-5865) Auto sharding of streaming sinks in FlinkRunner

Jozef Vilcek (JIRA) Sat, 24 Nov 2018 09:56:13 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16697914#comment-16697914
 ]


Jozef Vilcek commented on BEAM-5865:
------------------------------------

I will do my best to figure it out and find enough time for it.

You will have to bear with me as these is very Beam internal and fuzzy to me. I 
had a look to suggested example from DataFlow runner
[https://github.com/apache/beam/commit/cbb922c8a72680c5b8b4299197b515abf650bfdf#diff-a79d5c3c33f6ef1c4894b97ca907d541R347]



That seems straightforward enough, but here we need to do something more 
evolved. Right now, I can picture these 2 tasks here:
 # If user do not specify numShards, runner should not crash but pick some 
default (e.g. numWorkers - flink parallelism)
 # Make sure shards are distributed evenly among the workers and do not produce 
hot spots

The first one can be reproduced from given PR example for DataFlow. The second 
one I have no idea how to proceed. Any hints on where to look?

> Auto sharding of streaming sinks in FlinkRunner
> -----------------------------------------------
>
>                 Key: BEAM-5865
>                 URL: https://issues.apache.org/jira/browse/BEAM-5865
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-flink
>            Reporter: Maximilian Michels
>            Priority: Major
>
> The Flink Runner should do auto-sharding of streaming sinks, similar to 
> BEAM-1438. That way, the user doesn't have to set shards manually which 
> introduces additional shuffling and might cause skew in the distribution of 
> data.
> As per discussion: 
> https://lists.apache.org/thread.html/7b92145dd9ae68da1866f1047445479f51d31f103d6407316bb4114c@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-5865) Auto sharding of streaming sinks in FlinkRunner

Reply via email to