hello, I read all the above emails, but I didn't understand the purpose of this configuration. According to my idea, there should be a metadata management module. Users can create tables under the web page of this module, and then write sql in the web page and execute it. , is what I said and the discussion mainly the same thing?
Huajie Wang <[email protected]> 于2022年10月21日周五 16:22写道: > hi all: > > Redefined the rules of the configuration file as follows: > > flink: > option: > ... > property: #@see: > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/ > ... > table: # @see > > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/config/ > ... > > app: # user's parameter > ... > sql: > ... > > > > Looking forward to your opinion. > > > > Best, > Huajie Wang > > > > Huajie Wang <[email protected]> 于2022年10月20日周四 23:18写道: > > > > 2. For the table config, could we use `env.table-property` as the > prefix? > > If the prefix of flink table config isn't table, what can we do? > > StreamPark should not be affected by flink parameter naming. > > > > AFAIK, flink table'property key all startwith "table", sql-client is a > > special case, sql-client is just a program that flink comes with to > execute > > sql, in other words, I don't need sql-client to execute sql, so I don't > > need those parameters, There is an essential difference between the > > parameters defined by sql-client and the table'properties > > > > Best, > > Huajie Wang > > > > > > > > Rui Fan <[email protected]> 于2022年10月20日周四 22:10写道: > > > >> Hi huajie, > >> > >> Thanks for your great proposal. > >> > >> I have 2 questions: > >> 1. Why do you write the sql content in the config file? > >> 2. For the table config, could we use `env.table-property` as the > prefix? > >> If the prefix of flink table config isn't table, what can we do? > >> StreamPark should not be affected by flink parameter naming. > >> > >> The prefix of some table configs are sql-client. For example: > >> sql-client.display.max-column-width [1] > >> > >> My suggested format: > >> > >> ``` > >> env: > >> option: #cli option args > >> target: yarn-application # yarn-application, yarn-perjob > >> shutdownOnAttachedExit: > >> jobmanager: > >> ... > >> property: > >> ${StreamExecutionEnvironment.key} : $value > >> ... > >> table-property: > >> table.exec.mini-batch.enabled : true > >> ``` > >> > >> > >> [1] > >> > >> > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/config/#sql-client-display-max-column-width > >> > >> Best > >> Rui Fan > >> > >> On Thu, Oct 20, 2022 at 4:48 PM Huajie Wang <[email protected]> wrote: > >> > >> > Hello everyone, this discussion is about the development of a unified > >> > specification for flink'job profiles in streampark, Welcome to join > the > >> > discussion > >> > > >> > > >> > > >> > > >> > *background:* > >> > Streampark is positioned as a rapid development framework such as > flink > >> & > >> > spark. An important part of it is standardized configuration: put all > >> the > >> > configurations hardcoded in the code into the configuration file. When > >> the > >> > project starts, you only need to pass in the agreed configuration. The > >> file > >> > can complete the initialization of the environment and the setting of > >> > parameters, > >> > Because the parameter specification customization in the current > >> version is > >> > somewhat unreasonable, the specific performance is as follows: the > >> format > >> > of the parameter is redefined, which is slightly different from the > >> > official configuration of flink. For this part, pr has already done > >> related > >> > work [1], the specific method It is to put the parameter settings of > >> env in > >> > flink under property. > >> > The key is the key of the standard parameter in flink, but this part > >> only > >> > regulates the parameter configuration under the property, and does not > >> > regulate the global parameter setting. > >> > > >> > > >> > The current configuration rules is as follows: > >> > > >> > flink: > >> > deployment: > >> > option: > >> > ... > >> > property: > >> > ... > >> > > >> > > >> > For example: Now the flink'job is deployed in yarn-perjob mode, the > job > >> > name is: test-job, the parallelism is 2, and the entity class is: > >> > org.apache.streampark.FlinkJob, so the configuration is as follows: > >> > > >> > flink: > >> > deployment: > >> > option: > >> > target: yarn-per-job > >> > property: > >> > $internal.application.main: org.apache.streampark.FlinkJob > >> > pipeline.name: test-job > >> > taskmanager.numberOfTaskSlots: 1 > >> > parallelism.default: 2 > >> > > >> > > >> > we can see, root prefix is `flink`, The `option` defined the > parameters > >> > related to the deployment task, > >> > and the `property` defined the parameter configuration in flink. The > >> > configurable parameters is completely consistent with the standard > >> > parameters in flink [2], There are deficiencies in this design > >> > specification, which are manifested as follows: > >> > > >> > 1. The format of table-related parameter settings is not defined > >> > 2. The user's business parameters are not defined > >> > 3. The content of flinksql is not defined. > >> > > >> > Therefore, the purpose of this discussion is to solve this problem and > >> > further standardize the parameters. Since the design of this part of > the > >> > specification is more important, it will directly affect the users > >> > developed with the streampark api, so it is necessary for us to > conduct > >> > in-depth communication and discussion. > >> > > >> > > >> > > >> > *Proposal:* > >> > The improved format I initially proposed[3] is for example, the > >> parameters > >> > are generally divided into three parts, env, app, sql, "env" defined > >> > deployment parameters and environment setting related parameters, and > >> table > >> > parameters, "app" defined user-defined parameters, "sql" defined the > >> > content of flinksql. > >> > > >> > env: > >> > option: #cli opiton args > >> > target: yarn-application # yarn-application, yarn-perjob > >> > shutdownOnAttachedExit: > >> > jobmanager: > >> > ... > >> > property: > >> > ${StreamExecutionEnvironment.key} : $value > >> > ... > >> > table: > >> > ${TableEnvironment.key} : $value > >> > ... > >> > sql: # flinksql > >> > my_flinksql: | > >> > CREATE TABLE datagen ( > >> > f_sequence INT, > >> > ts AS localtimestamp, > >> > WATERMARK FOR ts AS ts > >> > ) WITH ( > >> > .... > >> > ); > >> > ... > >> > > >> > app: > >> > kafka.bootstrap: > >> > kafka.topic: test > >> > ... > >> > > >> > > >> > Looking forward to your opinion. > >> > > >> > > >> > > >> > [1] : https://github.com/apache/incubator-streampark/issues/1762 > >> > [2] : > >> > > >> > > >> > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/ > >> > [3] : https://github.com/apache/incubator-streampark/issues/1857 > >> > > >> > Best, > >> > Huajie Wang > >> > > >> > > >> > > >> > Best, > >> > Huajie Wang > >> > > >> > > >
