Re: [DISCUSS] Unify flink configuration #1857

Huajie Wang Thu, 20 Oct 2022 08:10:52 -0700

> 2. For the table config, could we use `env.table-property` as the prefix?
If the prefix of flink table config isn't table, what can we do?
StreamPark should not be affected by flink parameter naming.


table-property is not concise enough, too long, Both table'property and
property define parameters to be set. The difference is that one is to set
env and the other is to set table, So I put table'property under "property"


Best,
Huajie Wang



Rui Fan <[email protected]> 于2022年10月20日周四 22:10写道：

> Hi huajie,
>
> Thanks for your great proposal.
>
> I have 2 questions:
> 1. Why do you write the sql content in the config file?
> 2. For the table config, could we use `env.table-property` as the prefix?
> If the prefix of flink table config isn't table, what can we do?
> StreamPark should not be affected by flink parameter naming.
>
> The prefix of some table configs are sql-client. For example:
> sql-client.display.max-column-width [1]
>
> My suggested format:
>
> ```
> env:
>    option: #cli option args
>      target: yarn-application # yarn-application, yarn-perjob
>      shutdownOnAttachedExit:
>      jobmanager:
>      ...
>    property:
>      ${StreamExecutionEnvironment.key} : $value
>      ...
>    table-property:
>      table.exec.mini-batch.enabled : true
> ```
>
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/config/#sql-client-display-max-column-width
>
> Best
> Rui Fan
>
> On Thu, Oct 20, 2022 at 4:48 PM Huajie Wang <[email protected]> wrote:
>
> > Hello everyone, this discussion is about the development of a unified
> > specification for flink'job profiles in streampark, Welcome to join the
> > discussion
> >
> >
> >
> >
> > *background:*
> > Streampark is positioned as a rapid development framework such as flink &
> > spark. An important part of it is standardized configuration: put all the
> > configurations hardcoded in the code into the configuration file. When
> the
> > project starts, you only need to pass in the agreed configuration. The
> file
> > can complete the initialization of the environment and the setting of
> > parameters,
> > Because the parameter specification customization in the current version
> is
> > somewhat unreasonable, the specific performance is as follows: the format
> > of the parameter is redefined, which is slightly different from the
> > official configuration of flink. For this part, pr has already done
> related
> > work [1], the specific method It is to put the parameter settings of env
> in
> > flink under property.
> > The key is the key of the standard parameter in flink,  but this part
> only
> > regulates the parameter configuration under the property, and does not
> > regulate the global parameter setting.
> >
> >
> > The current configuration rules is as follows:
> >
> > flink:
> >   deployment:
> >     option:
> >         ...
> >     property:
> >         ...
> >
> >
> > For example: Now the flink'job is deployed in yarn-perjob mode, the job
> > name is: test-job, the parallelism is 2, and the entity class is:
> > org.apache.streampark.FlinkJob, so the configuration is as follows:
> >
> > flink:
> >   deployment:
> >     option:
> >         target: yarn-per-job
> >     property:
> >         $internal.application.main: org.apache.streampark.FlinkJob
> >         pipeline.name: test-job
> >         taskmanager.numberOfTaskSlots: 1
> >         parallelism.default: 2
> >
> >
> > we can see, root prefix is `flink`, The `option` defined the parameters
> > related to the deployment task,
> > and the `property` defined the parameter configuration in flink. The
> > configurable parameters is completely consistent with the standard
> > parameters in flink [2], There are deficiencies in this design
> > specification, which are manifested as follows:
> >
> > 1. The format of table-related parameter settings is not defined
> > 2. The user's business parameters are not defined
> > 3. The content of flinksql is not defined.
> >
> > Therefore, the purpose of this discussion is to solve this problem and
> > further standardize the parameters. Since the design of this part of the
> > specification is more important, it will directly affect the users
> > developed with the streampark api, so it is necessary for us to conduct
> > in-depth communication and discussion.
> >
> >
> >
> > *Proposal:*
> > The improved format I initially proposed[3] is for example, the
> parameters
> > are generally divided into three parts, env, app, sql, "env" defined
> > deployment parameters and environment setting related parameters, and
> table
> > parameters, "app" defined user-defined parameters, "sql" defined the
> > content of flinksql.
> >
> > env:
> >   option: #cli opiton args
> >     target: yarn-application # yarn-application, yarn-perjob
> >     shutdownOnAttachedExit:
> >     jobmanager:
> >     ...
> >   property:
> >     ${StreamExecutionEnvironment.key} : $value
> >     ...
> >     table:
> >       ${TableEnvironment.key} : $value
> >       ...
> > sql: # flinksql
> >    my_flinksql: |
> >     CREATE TABLE datagen (
> >       f_sequence INT,
> >       ts AS localtimestamp,
> >       WATERMARK FOR ts AS ts
> >     ) WITH (
> >       ....
> >     );
> >     ...
> >
> > app:
> >     kafka.bootstrap:
> >     kafka.topic: test
> >     ...
> >
> >
> > Looking forward to your opinion.
> >
> >
> >
> > [1] : https://github.com/apache/incubator-streampark/issues/1762
> > [2] :
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/
> > [3] : https://github.com/apache/incubator-streampark/issues/1857
> >
> > Best,
> > Huajie Wang
> >
> >
> >
> > Best,
> > Huajie Wang
> >
>

Re: [DISCUSS] Unify flink configuration #1857

Reply via email to