Re: [DISCUSS] Unify flink configuration #1857

Huajie Wang Sat, 22 Oct 2022 05:59:43 -0700

> I saw this, I understand what you mean, is it more convenient for users to
write sql directly on the web page, and is there a greater use for writing
it in the configuration file?



hi Chunjin Mu:


streampark provides a flink development framework and a job management
platform, The two parts do not affect each other,

user can choose the development framework to develop a flink job, or
use the streampark-platform to management flink jobs,

If the user uses the streampark-platform to develop a flinksql job, No
config file required,


This discussion has been talking about the user using the table API to
develop a flink sql job, the traditional way is to put flink sql in
the code. The flinksql job develop in this way is very poorly readable
and maintainable, Our solution is: put flinksql in the configuration
file, when the job starts, only need to pass in this configuration
file, it will automatically discover and extract the corresponding
flinksql content. more detail see docs[1]. This part of the function
has already been implemented in streampark, This time is to
re-standardize the format of the configuration file.


Looking forward to your opinion.


[1] https://streampark.apache.org/docs/development/config/#sql-file



Best,
Huajie Wang



Chunjin Mu <[email protected]> 于2022年10月22日周六 16:00写道：

>  > 1. Why do you write the sql content in the config file?
>
> Because streampark takes into account that users will use the Table API to
> develop a flink job, the traditional way is to put flink sql in the code.
> This is very inelegant. We propose to extract flink sql into the
> configuration file, which has already been developed in streampark.
> support, the code is as follows:
>
> import com.streamxhub.streamx.flink.core.scala.FlinkStreamTable
>
> object FlinkSqlJob extends FlinkStreamTable {
>
>   override def handle(): Unit = {
>        context.sql("my_flinksql_1")
>
>   }
> }
>
>
> I saw this, I understand what you mean, is it more convenient for users to
> write sql directly on the web page, and is there a greater use for writing
> it in the configuration file?
>
> Chunjin Mu <[email protected]> 于2022年10月22日周六 15:35写道：
>
> > hello, I read all the above emails, but I didn't understand the purpose
> of
> > this configuration. According to my idea, there should be a metadata
> > management module. Users can create tables under the web page of this
> > module, and then write sql in the web page and execute it. , is what I
> > said and the discussion mainly the same thing?
> >
> > Huajie Wang <[email protected]> 于2022年10月21日周五 16:22写道：
> >
> >> hi all:
> >>
> >> Redefined the rules of the configuration file as follows:
> >>
> >> flink:
> >> option:
> >> ...
> >> property: #@see:
> >>
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/
> >> ...
> >> table: # @see
> >>
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/config/
> >> ...
> >>
> >> app: # user's parameter
> >> ...
> >> sql:
> >> ...
> >>
> >>
> >>
> >> Looking forward to your opinion.
> >>
> >>
> >>
> >> Best,
> >> Huajie Wang
> >>
> >>
> >>
> >> Huajie Wang <[email protected]> 于2022年10月20日周四 23:18写道：
> >>
> >> > > 2. For the table config, could we use `env.table-property` as the
> >> prefix?
> >> > If the prefix of flink table config isn't table, what can we do?
> >> > StreamPark should not be affected by flink parameter naming.
> >> >
> >> > AFAIK, flink table'property key all startwith "table", sql-client is a
> >> > special case, sql-client is just a program that flink comes with to
> >> execute
> >> > sql, in other words, I don't need sql-client to execute sql, so I
> don't
> >> > need those parameters, There is an essential difference between the
> >> > parameters defined by sql-client and the table'properties
> >> >
> >> > Best,
> >> > Huajie Wang
> >> >
> >> >
> >> >
> >> > Rui Fan <[email protected]> 于2022年10月20日周四 22:10写道：
> >> >
> >> >> Hi huajie,
> >> >>
> >> >> Thanks for your great proposal.
> >> >>
> >> >> I have 2 questions:
> >> >> 1. Why do you write the sql content in the config file?
> >> >> 2. For the table config, could we use `env.table-property` as the
> >> prefix?
> >> >> If the prefix of flink table config isn't table, what can we do?
> >> >> StreamPark should not be affected by flink parameter naming.
> >> >>
> >> >> The prefix of some table configs are sql-client. For example:
> >> >> sql-client.display.max-column-width [1]
> >> >>
> >> >> My suggested format:
> >> >>
> >> >> ```
> >> >> env:
> >> >>    option: #cli option args
> >> >>      target: yarn-application # yarn-application, yarn-perjob
> >> >>      shutdownOnAttachedExit:
> >> >>      jobmanager:
> >> >>      ...
> >> >>    property:
> >> >>      ${StreamExecutionEnvironment.key} : $value
> >> >>      ...
> >> >>    table-property:
> >> >>      table.exec.mini-batch.enabled : true
> >> >> ```
> >> >>
> >> >>
> >> >> [1]
> >> >>
> >> >>
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/config/#sql-client-display-max-column-width
> >> >>
> >> >> Best
> >> >> Rui Fan
> >> >>
> >> >> On Thu, Oct 20, 2022 at 4:48 PM Huajie Wang <[email protected]>
> wrote:
> >> >>
> >> >> > Hello everyone, this discussion is about the development of a
> unified
> >> >> > specification for flink'job profiles in streampark, Welcome to join
> >> the
> >> >> > discussion
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > *background:*
> >> >> > Streampark is positioned as a rapid development framework such as
> >> flink
> >> >> &
> >> >> > spark. An important part of it is standardized configuration: put
> all
> >> >> the
> >> >> > configurations hardcoded in the code into the configuration file.
> >> When
> >> >> the
> >> >> > project starts, you only need to pass in the agreed configuration.
> >> The
> >> >> file
> >> >> > can complete the initialization of the environment and the setting
> of
> >> >> > parameters,
> >> >> > Because the parameter specification customization in the current
> >> >> version is
> >> >> > somewhat unreasonable, the specific performance is as follows: the
> >> >> format
> >> >> > of the parameter is redefined, which is slightly different from the
> >> >> > official configuration of flink. For this part, pr has already done
> >> >> related
> >> >> > work [1], the specific method It is to put the parameter settings
> of
> >> >> env in
> >> >> > flink under property.
> >> >> > The key is the key of the standard parameter in flink,  but this
> part
> >> >> only
> >> >> > regulates the parameter configuration under the property, and does
> >> not
> >> >> > regulate the global parameter setting.
> >> >> >
> >> >> >
> >> >> > The current configuration rules is as follows:
> >> >> >
> >> >> > flink:
> >> >> >   deployment:
> >> >> >     option:
> >> >> >         ...
> >> >> >     property:
> >> >> >         ...
> >> >> >
> >> >> >
> >> >> > For example: Now the flink'job is deployed in yarn-perjob mode, the
> >> job
> >> >> > name is: test-job, the parallelism is 2, and the entity class is:
> >> >> > org.apache.streampark.FlinkJob, so the configuration is as follows:
> >> >> >
> >> >> > flink:
> >> >> >   deployment:
> >> >> >     option:
> >> >> >         target: yarn-per-job
> >> >> >     property:
> >> >> >         $internal.application.main: org.apache.streampark.FlinkJob
> >> >> >         pipeline.name: test-job
> >> >> >         taskmanager.numberOfTaskSlots: 1
> >> >> >         parallelism.default: 2
> >> >> >
> >> >> >
> >> >> > we can see, root prefix is `flink`, The `option` defined the
> >> parameters
> >> >> > related to the deployment task,
> >> >> > and the `property` defined the parameter configuration in flink.
> The
> >> >> > configurable parameters is completely consistent with the standard
> >> >> > parameters in flink [2], There are deficiencies in this design
> >> >> > specification, which are manifested as follows:
> >> >> >
> >> >> > 1. The format of table-related parameter settings is not defined
> >> >> > 2. The user's business parameters are not defined
> >> >> > 3. The content of flinksql is not defined.
> >> >> >
> >> >> > Therefore, the purpose of this discussion is to solve this problem
> >> and
> >> >> > further standardize the parameters. Since the design of this part
> of
> >> the
> >> >> > specification is more important, it will directly affect the users
> >> >> > developed with the streampark api, so it is necessary for us to
> >> conduct
> >> >> > in-depth communication and discussion.
> >> >> >
> >> >> >
> >> >> >
> >> >> > *Proposal:*
> >> >> > The improved format I initially proposed[3] is for example, the
> >> >> parameters
> >> >> > are generally divided into three parts, env, app, sql, "env"
> defined
> >> >> > deployment parameters and environment setting related parameters,
> and
> >> >> table
> >> >> > parameters, "app" defined user-defined parameters, "sql" defined
> the
> >> >> > content of flinksql.
> >> >> >
> >> >> > env:
> >> >> >   option: #cli opiton args
> >> >> >     target: yarn-application # yarn-application, yarn-perjob
> >> >> >     shutdownOnAttachedExit:
> >> >> >     jobmanager:
> >> >> >     ...
> >> >> >   property:
> >> >> >     ${StreamExecutionEnvironment.key} : $value
> >> >> >     ...
> >> >> >     table:
> >> >> >       ${TableEnvironment.key} : $value
> >> >> >       ...
> >> >> > sql: # flinksql
> >> >> >    my_flinksql: |
> >> >> >     CREATE TABLE datagen (
> >> >> >       f_sequence INT,
> >> >> >       ts AS localtimestamp,
> >> >> >       WATERMARK FOR ts AS ts
> >> >> >     ) WITH (
> >> >> >       ....
> >> >> >     );
> >> >> >     ...
> >> >> >
> >> >> > app:
> >> >> >     kafka.bootstrap:
> >> >> >     kafka.topic: test
> >> >> >     ...
> >> >> >
> >> >> >
> >> >> > Looking forward to your opinion.
> >> >> >
> >> >> >
> >> >> >
> >> >> > [1] : https://github.com/apache/incubator-streampark/issues/1762
> >> >> > [2] :
> >> >> >
> >> >> >
> >> >>
> >>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/
> >> >> > [3] : https://github.com/apache/incubator-streampark/issues/1857
> >> >> >
> >> >> > Best,
> >> >> > Huajie Wang
> >> >> >
> >> >> >
> >> >> >
> >> >> > Best,
> >> >> > Huajie Wang
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: [DISCUSS] Unify flink configuration #1857

Reply via email to