[ 
https://issues.apache.org/jira/browse/BEAM-2873?focusedWorklogId=174892&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-174892
 ]

ASF GitHub Bot logged work on BEAM-2873:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Dec/18 14:13
            Start Date: 13/Dec/18 14:13
    Worklog Time Spent: 10m 
      Work Description: mxm commented on a change in pull request #4760: 
[BEAM-2873] Setting number of shards for writes with runner determined sharding
URL: https://github.com/apache/beam/pull/4760#discussion_r241411173
 
 

 ##########
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
 ##########
 @@ -64,6 +64,12 @@
   Integer getParallelism();
   void setParallelism(Integer value);
 
+  @Description("The maximal degree of parallelism to be used when distributing 
operations "
+      + "onto workers.")
+  @Default.Integer(-1)
+  Integer getMaxParallelism();
+  void setMaxParallelism(Integer value);
 
 Review comment:
   This needs to be updated. Support for max parallelism has already been added.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 174892)

> Detect number of shards for file sink in Flink Streaming Runner
> ---------------------------------------------------------------
>
>                 Key: BEAM-2873
>                 URL: https://issues.apache.org/jira/browse/BEAM-2873
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-flink
>            Reporter: Aljoscha Krettek
>            Assignee: Dawid Wysakowicz
>            Priority: Major
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> [~reuvenlax] mentioned that this is done for the Dataflow Runner and the 
> default behaviour on Flink can be somewhat surprising for users.
> ML entry: https://www.mail-archive.com/[email protected]/msg02665.html:
> This is how the file sink has always worked in Beam. If no sharding is 
> specified, then this means runner-determined sharding, and by default that is 
> one file per bundle. If Flink has small bundles, then I suggest using the 
> withNumShards method to explicitly pick the number of output shards.
> The Flink runner can detect that runner-determined sharding has been chosen, 
> and override it with a specific number of shards. For example, the Dataflow 
> streaming runner (which as you mentioned also has small bundles) detects this 
> case and sets the number of out files shards based on the number of workers 
> in the worker pool 
> [Here|https://github.com/apache/beam/blob/9e6530adb00669b7cf0f01cb8b128be0a21fd721/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java#L354]
>  is the code that does this; it should be quite simple to do something 
> similar for Flink, and then there will be no need for users to explicitly 
> call withNumShards themselves.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to