[ 
https://issues.apache.org/jira/browse/BEAM-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722907#comment-16722907
 ] 

Maximilian Michels commented on BEAM-5865:
------------------------------------------

Just to restate the problem. You want to write your partitions to disk as they 
are. The partitions are already "keyed" and the writes should be performed per 
window. You don't want to perform any shuffling.

For now I think you would be better off with a custom transform. We could even 
make such a transform part of the Beam connectors, or extend the Write 
transform. It is correct, that Write currently assumes a GroupByKey for its 
write operations, so that would have to be changed for the new operation mode.

> Auto sharding of streaming sinks in FlinkRunner
> -----------------------------------------------
>
>                 Key: BEAM-5865
>                 URL: https://issues.apache.org/jira/browse/BEAM-5865
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-flink
>            Reporter: Maximilian Michels
>            Priority: Major
>
> The Flink Runner should do auto-sharding of streaming sinks, similar to 
> BEAM-1438. That way, the user doesn't have to set shards manually which 
> introduces additional shuffling and might cause skew in the distribution of 
> data.
> As per discussion: 
> https://lists.apache.org/thread.html/7b92145dd9ae68da1866f1047445479f51d31f103d6407316bb4114c@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to