[
https://issues.apache.org/jira/browse/BEAM-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722907#comment-16722907
]
Maximilian Michels commented on BEAM-5865:
------------------------------------------
Just to restate the problem. You want to write your partitions to disk as they
are. The partitions are already "keyed" and the writes should be performed per
window. You don't want to perform any shuffling.
For now I think you would be better off with a custom transform. We could even
make such a transform part of the Beam connectors, or extend the Write
transform. It is correct, that Write currently assumes a GroupByKey for its
write operations, so that would have to be changed for the new operation mode.
> Auto sharding of streaming sinks in FlinkRunner
> -----------------------------------------------
>
> Key: BEAM-5865
> URL: https://issues.apache.org/jira/browse/BEAM-5865
> Project: Beam
> Issue Type: Improvement
> Components: runner-flink
> Reporter: Maximilian Michels
> Priority: Major
>
> The Flink Runner should do auto-sharding of streaming sinks, similar to
> BEAM-1438. That way, the user doesn't have to set shards manually which
> introduces additional shuffling and might cause skew in the distribution of
> data.
> As per discussion:
> https://lists.apache.org/thread.html/7b92145dd9ae68da1866f1047445479f51d31f103d6407316bb4114c@%3Cuser.beam.apache.org%3E
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)