[ 
https://issues.apache.org/jira/browse/BEAM-5865?focusedWorklogId=238478&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-238478
 ]

ASF GitHub Bot logged work on BEAM-5865:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/May/19 13:50
            Start Date: 07/May/19 13:50
    Worklog Time Spent: 10m 
      Work Description: mxm commented on pull request #8499: [BEAM-5865] Create 
optional auto-balancing sharding function for Flink
URL: https://github.com/apache/beam/pull/8499#discussion_r281629398
 
 

 ##########
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPipelineTranslator.java
 ##########
 @@ -222,7 +282,117 @@ boolean canTranslate(T transform, 
FlinkStreamingTranslationContext context) {
     @Override
     public Map<PValue, ReplacementOutput> mapOutputs(
         Map<TupleTag<?>, PValue> outputs, WriteFilesResult<DestinationT> 
newOutput) {
-      return Collections.emptyMap();
+      return ReplacementOutputs.tagged(outputs, newOutput);
+    }
+  }
+
+  /**
+   * Flink has a known problem of unevenly assigning keys to key groups (and 
then workers) for cases
+   * that number of keys is not >> key groups. This is typical scenario when 
writing files, where
 
 Review comment:
   I wouldn't say this is a Flink problem, it just comes with the nature of 
hashing algorithms (Murmur hash in this case) when the number of keys is not 
much bigger than the number of partitions.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 238478)
    Time Spent: 1.5h  (was: 1h 20m)

> Auto sharding of streaming sinks in FlinkRunner
> -----------------------------------------------
>
>                 Key: BEAM-5865
>                 URL: https://issues.apache.org/jira/browse/BEAM-5865
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-flink
>            Reporter: Maximilian Michels
>            Assignee: Jozef Vilcek
>            Priority: Major
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The Flink Runner should do auto-sharding of streaming sinks, similar to 
> BEAM-1438. That way, the user doesn't have to set shards manually which 
> introduces additional shuffling and might cause skew in the distribution of 
> data.
> As per discussion: 
> https://lists.apache.org/thread.html/7b92145dd9ae68da1866f1047445479f51d31f103d6407316bb4114c@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to