[
https://issues.apache.org/jira/browse/BEAM-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737511#comment-16737511
]
Jozef Vilcek commented on BEAM-5865:
------------------------------------
Yes, key generation "hack" is suboptimal, but it is what I got out also from
flink dev mailing list. So ...?
[https://lists.apache.org/thread.html/14c1ee6a9546b09a3c6e67eabc520bed07fc4a38f520594bd8fd2432@%3Cdev.flink.apache.org%3E]
Having this in responsibility of the client would be elegant delegation.
However, I am worried about 2 things. First is quite disruptive change ti
FileIO and WriteFiles. Second is that that logic must account for FlinkRunner
internal handling of key via encoding it and turning to ByteBuffer.
(WorkItemKeySelector and KvToByteBufferKeySelector).
Therefore, giving that Flink dev does not give any better option itself (not
trying to give some so far) and runner specific key handlings I am thinking it
this can not be treated as some kind of runner optimisation, which can be
turned on/off via configuration. Runner would replace mapping functions which
generates ShardedKey with some kind of FlinkAutoShardedKey which internally
will leverage flink specific runtime information.
> Auto sharding of streaming sinks in FlinkRunner
> -----------------------------------------------
>
> Key: BEAM-5865
> URL: https://issues.apache.org/jira/browse/BEAM-5865
> Project: Beam
> Issue Type: Improvement
> Components: runner-flink
> Reporter: Maximilian Michels
> Priority: Major
>
> The Flink Runner should do auto-sharding of streaming sinks, similar to
> BEAM-1438. That way, the user doesn't have to set shards manually which
> introduces additional shuffling and might cause skew in the distribution of
> data.
> As per discussion:
> https://lists.apache.org/thread.html/7b92145dd9ae68da1866f1047445479f51d31f103d6407316bb4114c@%3Cuser.beam.apache.org%3E
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)