[
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058882#comment-16058882
]
sunjincheng commented on FLINK-6966:
------------------------------------
I think each Operator generate It's UID by different factors. Let's look at a
query as follows:
{code}
val t = StreamTestData.get5TupleDataStream(env)
.toTable(tEnv, 'a, 'b, 'c, 'd, 'e, 'proctime.proctime)
tEnv.registerTable("MyTable", t)
val sqlQuery = "SELECT a, " +
" SUM(c) OVER (" +
" PARTITION BY a ORDER BY proctime ROWS BETWEEN 4 PRECEDING AND
CURRENT ROW), " +
" MIN(c) OVER (" +
" PARTITION BY a ORDER BY proctime ROWS BETWEEN 4 PRECEDING AND
CURRENT ROW) " +
"FROM MyTable"
val result = tEnv.sql(sqlQuery).toAppendStream[Row]
result.addSink(new StreamITCase.StringSink)
env.execute()
{code}
In the SQL above, involving the following main Operators:
* StreamSource
* StreamMap
* ProcessOperator
* KeyGroupStreamPartitioner
* KeyedProcessOperator
* StreamSink
And only operator which using state will impact scaling or restarting jobs from
save points. Such as {{ProcessOperator}} has apply a {{ProcessFunction}} which
using state.
But why we must set {{maxParallelism}} ?
> Add maxParallelism and UIDs to all operators generated by the Table API / SQL
> -----------------------------------------------------------------------------
>
> Key: FLINK-6966
> URL: https://issues.apache.org/jira/browse/FLINK-6966
> Project: Flink
> Issue Type: Improvement
> Components: Table API & SQL
> Affects Versions: 1.4.0
> Reporter: Fabian Hueske
> Assignee: sunjincheng
>
> At the moment, the Table API does not assign UIDs and the max parallelism to
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from
> savepoints.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)