[jira] [Commented] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL

sunjincheng (JIRA) Thu, 22 Jun 2017 00:14:45 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058882#comment-16058882
 ]


sunjincheng commented on FLINK-6966:
------------------------------------

I think each Operator generate It's UID by different factors. Let's look at a 
query as follows:

{code}
 val t = StreamTestData.get5TupleDataStream(env)
      .toTable(tEnv, 'a, 'b, 'c, 'd, 'e, 'proctime.proctime)
    tEnv.registerTable("MyTable", t)

    val sqlQuery = "SELECT a, " +
      "  SUM(c) OVER (" +
      "    PARTITION BY a ORDER BY proctime ROWS BETWEEN 4 PRECEDING AND 
CURRENT ROW), " +
      "  MIN(c) OVER (" +
      "    PARTITION BY a ORDER BY proctime ROWS BETWEEN 4 PRECEDING AND 
CURRENT ROW) " +
      "FROM MyTable"

    val result = tEnv.sql(sqlQuery).toAppendStream[Row]
    result.addSink(new StreamITCase.StringSink)
    env.execute()
{code}

In the SQL above, involving the following main Operators:
* StreamSource
* StreamMap
* ProcessOperator
* KeyGroupStreamPartitioner
* KeyedProcessOperator
* StreamSink

And only operator which using state will impact scaling or restarting jobs from 
save points. Such as {{ProcessOperator}} has apply a {{ProcessFunction}} which 
using state.

But why we must set {{maxParallelism}} ?

> Add maxParallelism and UIDs to all operators generated by the Table API / SQL
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-6966
>                 URL: https://issues.apache.org/jira/browse/FLINK-6966
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>    Affects Versions: 1.4.0
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
>
> At the moment, the Table API does not assign UIDs and the max parallelism to 
> operators (except for operators with parallelism 1).
> We should do that to avoid problems when rescaling or restarting jobs from 
> savepoints.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-6966) Add maxParallelism and UIDs to all operators generated by the Table API / SQL

Reply via email to