Re: [DISCUSS] Unify flink configuration #1857

Chunjin Mu Sat, 22 Oct 2022 00:36:17 -0700

 hello, I read all the above emails, but I didn't understand the purpose of
this configuration. According to my idea, there should be a metadata
management module. Users can create tables under the web page of this
module, and then write sql in the web page and execute it. , is what I said
and the discussion mainly the same thing?


Huajie Wang <[email protected]> 于2022年10月21日周五 16:22写道：

> hi all:
>
> Redefined the rules of the configuration file as follows:
>
> flink:
> option:
> ...
> property: #@see:
>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/
> ...
> table: # @see
>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/config/
> ...
>
> app: # user's parameter
> ...
> sql:
> ...
>
>
>
> Looking forward to your opinion.
>
>
>
> Best,
> Huajie Wang
>
>
>
> Huajie Wang <[email protected]> 于2022年10月20日周四 23:18写道：
>
> > > 2. For the table config, could we use `env.table-property` as the
> prefix?
> > If the prefix of flink table config isn't table, what can we do?
> > StreamPark should not be affected by flink parameter naming.
> >
> > AFAIK, flink table'property key all startwith "table", sql-client is a
> > special case, sql-client is just a program that flink comes with to
> execute
> > sql, in other words, I don't need sql-client to execute sql, so I don't
> > need those parameters, There is an essential difference between the
> > parameters defined by sql-client and the table'properties
> >
> > Best,
> > Huajie Wang
> >
> >
> >
> > Rui Fan <[email protected]> 于2022年10月20日周四 22:10写道：
> >
> >> Hi huajie,
> >>
> >> Thanks for your great proposal.
> >>
> >> I have 2 questions:
> >> 1. Why do you write the sql content in the config file?
> >> 2. For the table config, could we use `env.table-property` as the
> prefix?
> >> If the prefix of flink table config isn't table, what can we do?
> >> StreamPark should not be affected by flink parameter naming.
> >>
> >> The prefix of some table configs are sql-client. For example:
> >> sql-client.display.max-column-width [1]
> >>
> >> My suggested format:
> >>
> >> ```
> >> env:
> >>    option: #cli option args
> >>      target: yarn-application # yarn-application, yarn-perjob
> >>      shutdownOnAttachedExit:
> >>      jobmanager:
> >>      ...
> >>    property:
> >>      ${StreamExecutionEnvironment.key} : $value
> >>      ...
> >>    table-property:
> >>      table.exec.mini-batch.enabled : true
> >> ```
> >>
> >>
> >> [1]
> >>
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/config/#sql-client-display-max-column-width
> >>
> >> Best
> >> Rui Fan
> >>
> >> On Thu, Oct 20, 2022 at 4:48 PM Huajie Wang <[email protected]> wrote:
> >>
> >> > Hello everyone, this discussion is about the development of a unified
> >> > specification for flink'job profiles in streampark, Welcome to join
> the
> >> > discussion
> >> >
> >> >
> >> >
> >> >
> >> > *background:*
> >> > Streampark is positioned as a rapid development framework such as
> flink
> >> &
> >> > spark. An important part of it is standardized configuration: put all
> >> the
> >> > configurations hardcoded in the code into the configuration file. When
> >> the
> >> > project starts, you only need to pass in the agreed configuration. The
> >> file
> >> > can complete the initialization of the environment and the setting of
> >> > parameters,
> >> > Because the parameter specification customization in the current
> >> version is
> >> > somewhat unreasonable, the specific performance is as follows: the
> >> format
> >> > of the parameter is redefined, which is slightly different from the
> >> > official configuration of flink. For this part, pr has already done
> >> related
> >> > work [1], the specific method It is to put the parameter settings of
> >> env in
> >> > flink under property.
> >> > The key is the key of the standard parameter in flink,  but this part
> >> only
> >> > regulates the parameter configuration under the property, and does not
> >> > regulate the global parameter setting.
> >> >
> >> >
> >> > The current configuration rules is as follows:
> >> >
> >> > flink:
> >> >   deployment:
> >> >     option:
> >> >         ...
> >> >     property:
> >> >         ...
> >> >
> >> >
> >> > For example: Now the flink'job is deployed in yarn-perjob mode, the
> job
> >> > name is: test-job, the parallelism is 2, and the entity class is:
> >> > org.apache.streampark.FlinkJob, so the configuration is as follows:
> >> >
> >> > flink:
> >> >   deployment:
> >> >     option:
> >> >         target: yarn-per-job
> >> >     property:
> >> >         $internal.application.main: org.apache.streampark.FlinkJob
> >> >         pipeline.name: test-job
> >> >         taskmanager.numberOfTaskSlots: 1
> >> >         parallelism.default: 2
> >> >
> >> >
> >> > we can see, root prefix is `flink`, The `option` defined the
> parameters
> >> > related to the deployment task,
> >> > and the `property` defined the parameter configuration in flink. The
> >> > configurable parameters is completely consistent with the standard
> >> > parameters in flink [2], There are deficiencies in this design
> >> > specification, which are manifested as follows:
> >> >
> >> > 1. The format of table-related parameter settings is not defined
> >> > 2. The user's business parameters are not defined
> >> > 3. The content of flinksql is not defined.
> >> >
> >> > Therefore, the purpose of this discussion is to solve this problem and
> >> > further standardize the parameters. Since the design of this part of
> the
> >> > specification is more important, it will directly affect the users
> >> > developed with the streampark api, so it is necessary for us to
> conduct
> >> > in-depth communication and discussion.
> >> >
> >> >
> >> >
> >> > *Proposal:*
> >> > The improved format I initially proposed[3] is for example, the
> >> parameters
> >> > are generally divided into three parts, env, app, sql, "env" defined
> >> > deployment parameters and environment setting related parameters, and
> >> table
> >> > parameters, "app" defined user-defined parameters, "sql" defined the
> >> > content of flinksql.
> >> >
> >> > env:
> >> >   option: #cli opiton args
> >> >     target: yarn-application # yarn-application, yarn-perjob
> >> >     shutdownOnAttachedExit:
> >> >     jobmanager:
> >> >     ...
> >> >   property:
> >> >     ${StreamExecutionEnvironment.key} : $value
> >> >     ...
> >> >     table:
> >> >       ${TableEnvironment.key} : $value
> >> >       ...
> >> > sql: # flinksql
> >> >    my_flinksql: |
> >> >     CREATE TABLE datagen (
> >> >       f_sequence INT,
> >> >       ts AS localtimestamp,
> >> >       WATERMARK FOR ts AS ts
> >> >     ) WITH (
> >> >       ....
> >> >     );
> >> >     ...
> >> >
> >> > app:
> >> >     kafka.bootstrap:
> >> >     kafka.topic: test
> >> >     ...
> >> >
> >> >
> >> > Looking forward to your opinion.
> >> >
> >> >
> >> >
> >> > [1] : https://github.com/apache/incubator-streampark/issues/1762
> >> > [2] :
> >> >
> >> >
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/
> >> > [3] : https://github.com/apache/incubator-streampark/issues/1857
> >> >
> >> > Best,
> >> > Huajie Wang
> >> >
> >> >
> >> >
> >> > Best,
> >> > Huajie Wang
> >> >
> >>
> >
>

Re: [DISCUSS] Unify flink configuration #1857

Reply via email to