[jira] [Commented] (FLINK-10506) Introduce minimum, target and maximum parallelism to JobGraph

vinoyang (JIRA) Sun, 24 Mar 2019 23:47:31 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800436#comment-16800436
 ]


vinoyang commented on FLINK-10506:
----------------------------------

Hi [~till.rohrmann] I find some classes have introduced {{maxParallelism}} for 
dynamic scaling. I list these classes and methods about the parallelism below:

 
{code:java}
JobGraph#getMaximumParallelism : maximum parallelism of all operators in job 
graph;

JobVertex#getParallelism/setParallelism : one task’s parallelism;
//(if not set (-1)，it will be calculate in ExecutionJobVertex by 
computeDefaultMaxParallelism with JobVertex#getParallelism)
JobVertex#getMaxParallelism/setMaxParallelism : one task’s max parallelism in 
runtime; 

StreamNode#getParallelism/setParallelism : stream node’s parallelism
StreamNode#getMaxParallelism/setMaxParallelism : stream node’s max parallelism

StreamGraph#setParallelism : for single StreamNode
StreamGraph#setMaxParallelism : for single StreamNode

StreamTransformation#getParallelism/setParallelism :
StreamTransformation#getMaxParallelism/setMaxParallelism :

SingleOutputStreamOperator#setParallelism :
SingleOutputStreamOperator#setMaxParallelism :

StreamExecutionEnvironment#getParallelism : all operations default parallelism
StreamExecutionEnvironment#getMaxParallelism :

ExecutionConfig#getParallelism/setParallelism
ExecutionConfig#getMaxParallelism/setMaxParallelism
{code}
 

I think there are two questions need to confirm:
 * Should current maxParallelism list above used as maximum parallelism for 
declared resource management directly?
 * How to compatible with getParallelism/setParallelism in these classes in the 
future?
 * introduce three independent API for minimum/target/maximum or introduce a 
POJO class encapsulate them and provide a single API?
 * For now, shall we introduce APIs go through Operator -> JobGraph or just for 
JobGraph (calculate the value of min/target/max by exists parallelism)?

some thoughts:
 * for the first question: I think it's different, unless we change the 
calculate logic about the max parallelism in ExecutionJobVertex.
 * for the second question: currently, we use setParallelism and calculate the 
min/target/max, in the future, we can introduce special API for them and 
discard the setParallelism in many classes.
 * for the third question: it seems three single APIs are more intuitional
 * for the fourth question: it seems you chose the latter?

 

> Introduce minimum, target and maximum parallelism to JobGraph
> -------------------------------------------------------------
>
>                 Key: FLINK-10506
>                 URL: https://issues.apache.org/jira/browse/FLINK-10506
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.7.0
>            Reporter: Till Rohrmann
>            Assignee: vinoyang
>            Priority: Major
>
> In order to run a job with a variable parallelism, one needs to be able to 
> define the minimum and maximum parallelism for an operator as well as the 
> current target value. In the first implementation, minimum could be 1 and 
> maximum the max parallelism of the operator if no explicit parallelism has 
> been specified for an operator. If a parallelism p has been specified (via 
> setParallelism(p)), then minimum = maximum = p. The target value could be the 
> command line parameter -p or the default parallelism.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10506) Introduce minimum, target and maximum parallelism to JobGraph

Reply via email to