[jira] [Commented] (BEAM-5865) Auto sharding of streaming sinks in FlinkRunner

Jozef Vilcek (JIRA) Tue, 08 Jan 2019 12:59:16 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737511#comment-16737511
 ]


Jozef Vilcek commented on BEAM-5865:
------------------------------------

Yes, key generation "hack" is suboptimal, but it is what I got out also from 
flink dev mailing list. So ...?
[https://lists.apache.org/thread.html/14c1ee6a9546b09a3c6e67eabc520bed07fc4a38f520594bd8fd2432@%3Cdev.flink.apache.org%3E]

Having this in responsibility of the client would be elegant delegation. 
However, I am worried about 2 things. First is quite disruptive change ti 
FileIO and WriteFiles. Second is that that logic must account for FlinkRunner 
internal handling of key via encoding it and turning to ByteBuffer. 
(WorkItemKeySelector and KvToByteBufferKeySelector).

Therefore, giving that Flink dev does not give any better option itself (not 
trying to give some so far) and runner specific key handlings I am thinking it 
this can not be treated as some kind of runner optimisation, which can be 
turned on/off via configuration. Runner would replace mapping functions which 
generates ShardedKey with some kind of FlinkAutoShardedKey which internally 
will leverage flink specific runtime information.

> Auto sharding of streaming sinks in FlinkRunner
> -----------------------------------------------
>
>                 Key: BEAM-5865
>                 URL: https://issues.apache.org/jira/browse/BEAM-5865
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-flink
>            Reporter: Maximilian Michels
>            Priority: Major
>
> The Flink Runner should do auto-sharding of streaming sinks, similar to 
> BEAM-1438. That way, the user doesn't have to set shards manually which 
> introduces additional shuffling and might cause skew in the distribution of 
> data.
> As per discussion: 
> https://lists.apache.org/thread.html/7b92145dd9ae68da1866f1047445479f51d31f103d6407316bb4114c@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-5865) Auto sharding of streaming sinks in FlinkRunner

Reply via email to