Hi Jing, Thanks for your input!
> Would you like to add > one section to describe(better with script/code example) how to use it in > these two scenarios from users' perspective? OK. I’ll update the FLIP with the code snippet after I get the POC branch done. > NIT: the pictures have transparent background when readers click on it. It > would be great if you can replace them with pictures with white background. Fixed. Thanks for pointing that out :) Best, Paul Lam > 2023年6月27日 06:51,Jing Ge <[email protected]> 写道: > > Hi Paul, > > Thanks for driving it and thank you all for the informative discussion! The > FLIP is in good shape now. As described in the FLIP, SQL Driver will be > mainly used to run Flink SQLs in two scenarios: 1. SQL client/gateway in > application mode and 2. external system integration. Would you like to add > one section to describe(better with script/code example) how to use it in > these two scenarios from users' perspective? > > NIT: the pictures have transparent background when readers click on it. It > would be great if you can replace them with pictures with white background. > > Best regards, > Jing > > On Mon, Jun 26, 2023 at 1:31 PM Paul Lam <[email protected] > <mailto:[email protected]>> wrote: > >> Hi Shengkai, >> >>> * How can we ship the json plan to the JobManager? >> >> The Flink K8s module should be responsible for file distribution. We could >> introduce >> an option like `kubernetes.storage.dir`. For each flink cluster, there >> would be a >> dedicated subdirectory, with the pattern like >> `${kubernetes.storage.dir}/${cluster-id}`. >> >> All resources-related options (e.g. pipeline jars, json plans) that are >> configured with >> scheme `file://` <file://`/> <file:///%60 <file:///%60>> would be uploaded >> to the resource directory >> and downloaded to the >> jobmanager, before SQL Driver accesses the files with the original >> filenames. >> >> >>> * Classloading strategy >> >> >> We could directly specify the SQL Gateway jar as the jar file in >> PackagedProgram. >> It would be treated like a normal user jar and the SQL Driver is loaded >> into the user >> classloader. WDYT? >> >>> * Option `$internal.sql-gateway.driver.sql-config` is string type >>> I think it's better to use Map type here >> >> By Map type configuration, do you mean a nested map that contains all >> configurations? >> >> I hope I've explained myself well, it’s a file that contains the extra SQL >> configurations, which would be shipped to the jobmanager. >> >>> * PoC branch >> >> Sure. I’ll let you know once I get the job done. >> >> Best, >> Paul Lam >> >>> 2023年6月26日 14:27,Shengkai Fang <[email protected] >>> <mailto:[email protected]>> 写道: >>> >>> Hi, Paul. >>> >>> Thanks for your update. I have a few questions about the new design: >>> >>> * How can we ship the json plan to the JobManager? >>> >>> The current design only exposes an option about the URL of the json >> plan. It seems the gateway is responsible to upload to an external stroage. >> Can we reuse the PipelineOptions.JARS to ship to the remote filesystem? >>> >>> * Classloading strategy >>> >>> Currently, the Driver is in the sql-gateway package. It means the Driver >> is not in the JM's classpath directly. Because the sql-gateway jar is now >> in the opt directory rather than lib directory. It may need to add the >> external dependencies as Python does[1]. BTW, I think it's better to move >> the Driver into the flink-table-runtime package, which is much easier to >> find(Sorry for the wrong opinion before). >>> >>> * Option `$internal.sql-gateway.driver.sql-config` is string type >>> >>> I think it's better to use Map type here >>> >>> * PoC branch >>> >>> Because this FLIP involves many modules, do you have a PoC branch to >> verify it does work? >>> >>> Best, >>> Shengkai >>> >>> [1] >> https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L940 >> >> <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L940> >> < >> https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L940 >> >> <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java#L940> >>> >>> Paul Lam <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>>> >> 于2023年6月19日周一 14:09写道: >>> Hi Shengkai, >>> >>> Sorry for my late reply. It took me some time to update the FLIP. >>> >>> In the latest FLIP design, SQL Driver is placed in flink-sql-gateway >> module. PTAL. >>> >>> The FLIP does not cover details about the K8s file distribution, but its >> general usage would >>> be very much the same as YARN setups. We could make follow-up >> discussions in the jira >>> tickets. >>> >>> Best, >>> Paul Lam >>> >>>> 2023年6月12日 15:29,Shengkai Fang <[email protected] >>>> <mailto:[email protected]> <mailto: >> [email protected] <mailto:[email protected]>>> 写道: >>>> >>>> >>>>> If it’s the case, I’m good with introducing a new module and making >> SQL Driver >>>>> an internal class and accepts JSON plans only. >>>> >>>> I rethink this again and again. I think it's better to move the >> SqlDriver into the sql-gateway module because the sql client relies on the >> sql-gateway to submit the sql and the sql-gateway has the ability to >> generate the ExecNodeGraph now. +1 to support accepting JSON plans only. >>>> >>>> * Upload configuration through command line parameter >>>> >>>> ExecNodeGraph only contains the job's information but it doesn't >> contain the checkpoint dir, checkpoint interval, execution mode and so on. >> So I think we should also upload the configuration. >>>> >>>> * KubernetesClusterDescripter and >> KubernetesApplicationClusterEntrypoint are responsible for the jar >> upload/download >>>> >>>> +1 for the change. >>>> >>>> Could you update the FLIP about the current discussion? >>>> >>>> Best, >>>> Shengkai >>>> >>>> >>>> >>>> >>>> >>>> >>>> Yang Wang <[email protected] <mailto:[email protected]> >>>> <mailto:[email protected] <mailto:[email protected]>>> >> 于2023年6月12日周一 11:41写道: >>>> Sorry for the late reply. I am in favor of introducing such a built-in >>>> resource localization mechanism >>>> based on Flink FileSystem. Then FLINK-28915[1] could be the second step >>>> which will download >>>> the jars and dependencies to the JobManager/TaskManager local directory >>>> before working. >>>> >>>> The first step could be done in another ticket in Flink. Or some >> external >>>> Flink jobs management system >>>> could also take care of this. >>>> >>>> [1]. https://issues.apache.org/jira/browse/FLINK-28915 >>>> <https://issues.apache.org/jira/browse/FLINK-28915> < >> https://issues.apache.org/jira/browse/FLINK-28915 >> <https://issues.apache.org/jira/browse/FLINK-28915>> >>>> >>>> Best, >>>> Yang >>>> >>>> Paul Lam <[email protected] <mailto:[email protected]> >>>> <mailto:[email protected] <mailto:[email protected]>>> >> 于2023年6月9日周五 17:39写道: >>>> >>>>> Hi Mason, >>>>> >>>>> I get your point. I'm increasingly feeling the need to introduce a >>>>> built-in >>>>> file distribution mechanism for flink-kubernetes module, just like >> Spark >>>>> does with `spark.kubernetes.file.upload.path` [1]. >>>>> >>>>> I’m assuming the workflow is as follows: >>>>> >>>>> - KubernetesClusterDescripter uploads all local resources to a remote >>>>> storage via Flink filesystem (skips if the resources are already >> remote). >>>>> - KubernetesApplicationClusterEntrypoint downloads the resources >>>>> and put them in the classpath during startup. >>>>> >>>>> I wouldn't mind splitting it into another FLIP to ensure that >> everything is >>>>> done correctly. >>>>> >>>>> cc'ed @Yang to gather more opinions. >>>>> >>>>> [1] >>>>> >> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management >> >> <https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management> >> < >> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management >> >> <https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management> >>> >>>>> >>>>> Best, >>>>> Paul Lam >>>>> >>>>> 2023年6月8日 12:15,Mason Chen <[email protected] >>>>> <mailto:[email protected]> <mailto: >> [email protected] <mailto:[email protected]>>> 写道: >>>>> >>>>> Hi Paul, >>>>> >>>>> Thanks for your response! >>>>> >>>>> I agree that utilizing SQL Drivers in Java applications is equally >>>>> important >>>>> >>>>> as employing them in SQL Gateway. WRT init containers, I think most >>>>> users use them just as a workaround. For example, wget a jar from the >>>>> maven repo. >>>>> >>>>> We could implement the functionality in SQL Driver in a more graceful >>>>> way and the flink-supported filesystem approach seems to be a >>>>> good choice. >>>>> >>>>> >>>>> My main point is: can we solve the problem with a design agnostic of >> SQL >>>>> and Stream API? I mentioned a use case where this ability is useful >> for >>>>> Java or Stream API applications. Maybe this is even a non-goal to >> your FLIP >>>>> since you are focusing on the driver entrypoint. >>>>> >>>>> Jark mentioned some optimizations: >>>>> >>>>> This allows SQLGateway to leverage some metadata caching and UDF JAR >>>>> caching for better compiling performance. >>>>> >>>>> It would be great to see this even outside the SQLGateway (i.e. UDF >> JAR >>>>> caching). >>>>> >>>>> Best, >>>>> Mason >>>>> >>>>> On Wed, Jun 7, 2023 at 2:26 AM Shengkai Fang <[email protected] >>>>> <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>> >>>>> Hi. Paul. Thanks for your update and the update makes me understand >> the >>>>> design much better. >>>>> >>>>> But I still have some questions about the FLIP. >>>>> >>>>> For SQL Gateway, only DMLs need to be delegated to the SQL server >>>>> Driver. I would think about the details and update the FLIP. Do you >> have >>>>> >>>>> some >>>>> >>>>> ideas already? >>>>> >>>>> >>>>> If the applicaiton mode can not support library mode, I think we >> should >>>>> only execute INSERT INTO and UPDATE/ DELETE statement in the >> application >>>>> mode. AFAIK, we can not support ANALYZE TABLE and CALL PROCEDURE >>>>> statements. The ANALYZE TABLE syntax need to register the statistic >> to the >>>>> catalog after job finishes and the CALL PROCEDURE statement doesn't >>>>> generate the ExecNodeGraph. >>>>> >>>>> * Introduce storage via option `sql-gateway.application.storage-dir` >>>>> >>>>> If we can not support to submit the jars through web submission, +1 to >>>>> introduce the options to upload the files. While I think the uploader >>>>> should be responsible to remove the uploaded jars. Can we remove the >> jars >>>>> if the job is running or gateway exits? >>>>> >>>>> * JobID is not avaliable >>>>> >>>>> Can we use the returned rest client by ApplicationDeployer to query >> the job >>>>> id? I am concerned that users don't know which job is related to the >>>>> submitted SQL. >>>>> >>>>> * Do we need to introduce a new module named flink-table-sql-runner? >>>>> >>>>> It seems we need to introduce a new module. Will the new module is >>>>> available in the distribution package? I agree with Jark that we >> don't need >>>>> to introduce this for table-API users and these users have their main >>>>> class. If we want to make users write the k8s operator more easily, I >> think >>>>> we should modify the k8s operator repo. If we don't need to support >> SQL >>>>> files, can we make this jar only visible in the sql-gateway like we >> do in >>>>> the planner loader?[1] >>>>> >>>>> [1] >>>>> >>>>> >>>>> >> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95 >> >> <https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95> >> < >> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95 >> >> <https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95> >>> >>>>> >>>>> Best, >>>>> Shengkai >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Weihua Hu <[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>> >> 于2023年6月7日周三 10:52写道: >>>>> >>>>> Hi, >>>>> >>>>> Thanks for updating the FLIP. >>>>> >>>>> I have two cents on the distribution of SQLs and resources. >>>>> 1. Should we support a common file distribution mechanism for k8s >>>>> application mode? >>>>> I have seen some issues and requirements on the mailing list. >>>>> In our production environment, we implement the download command in >> the >>>>> CliFrontend. >>>>> And automatically add an init container to the POD for file >>>>> >>>>> downloading. >>>>> >>>>> The advantage of this >>>>> is that we can use all Flink-supported file systems to store files. >>>>> >>>>> This need more discussion. I would appreciate hearing more opinions. >>>>> >>>>> 2. In this FLIP, we distribute files in two different ways in YARN and >>>>> Kubernetes. Can we combine it in one way? >>>>> If we don't want to implement a common file distribution for k8s >>>>> application mode. Could we use the SQLDriver >>>>> to download the files both in YARN and K8S? IMO, this can reduce the >>>>> >>>>> cost >>>>> >>>>> of code maintenance. >>>>> >>>>> Best, >>>>> Weihua >>>>> >>>>> >>>>> On Wed, Jun 7, 2023 at 10:18 AM Paul Lam <[email protected] >>>>> <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>> >>>>> Hi Mason, >>>>> >>>>> Thanks for your input! >>>>> >>>>> +1 for init containers or a more generalized way of obtaining >>>>> >>>>> arbitrary >>>>> >>>>> files. File fetching isn't specific to just SQL--it also matters for >>>>> >>>>> Java >>>>> >>>>> applications if the user doesn't want to rebuild a Flink image and >>>>> >>>>> just >>>>> >>>>> wants to modify the user application fat jar. >>>>> >>>>> >>>>> I agree that utilizing SQL Drivers in Java applications is equally >>>>> important >>>>> as employing them in SQL Gateway. WRT init containers, I think most >>>>> users use them just as a workaround. For example, wget a jar from the >>>>> maven repo. >>>>> >>>>> We could implement the functionality in SQL Driver in a more graceful >>>>> way and the flink-supported filesystem approach seems to be a >>>>> good choice. >>>>> >>>>> Also, what do you think about prefixing the config options with >>>>> `sql-driver` instead of just `sql` to be more specific? >>>>> >>>>> >>>>> LGTM, since SQL Driver is a public interface and the options are >>>>> specific to it. >>>>> >>>>> Best, >>>>> Paul Lam >>>>> >>>>> 2023年6月6日 06:30,Mason Chen <[email protected] >>>>> <mailto:[email protected]> <mailto: >> [email protected] <mailto:[email protected]>>> 写道: >>>>> >>>>> Hi Paul, >>>>> >>>>> +1 for this feature and supporting SQL file + JSON plans. We get a >>>>> >>>>> lot >>>>> >>>>> of >>>>> >>>>> requests to just be able to submit a SQL file, but the JSON plan >>>>> optimizations make sense. >>>>> >>>>> +1 for init containers or a more generalized way of obtaining >>>>> >>>>> arbitrary >>>>> >>>>> files. File fetching isn't specific to just SQL--it also matters for >>>>> >>>>> Java >>>>> >>>>> applications if the user doesn't want to rebuild a Flink image and >>>>> >>>>> just >>>>> >>>>> wants to modify the user application fat jar. >>>>> >>>>> Please note that we could reuse the checkpoint storage like S3/HDFS, >>>>> >>>>> which >>>>> >>>>> should >>>>> >>>>> >>>>> be required to run Flink in production, so I guess that would be >>>>> >>>>> acceptable >>>>> >>>>> for most >>>>> >>>>> >>>>> users. WDYT? >>>>> >>>>> >>>>> If you do go this route, it would be nice to support writing these >>>>> >>>>> files >>>>> >>>>> to >>>>> >>>>> S3/HDFS via Flink. This makes access control and policy management >>>>> >>>>> simpler. >>>>> >>>>> >>>>> Also, what do you think about prefixing the config options with >>>>> `sql-driver` instead of just `sql` to be more specific? >>>>> >>>>> Best, >>>>> Mason >>>>> >>>>> On Mon, Jun 5, 2023 at 2:28 AM Paul Lam <[email protected] >>>>> <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >>>>> >>>>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>>> wrote: >>>>> >>>>> >>>>> Hi Jark, >>>>> >>>>> Thanks for your input! Please see my comments inline. >>>>> >>>>> Isn't Table API the same way as DataSream jobs to submit Flink SQL? >>>>> DataStream API also doesn't provide a default main class for users, >>>>> why do we need to provide such one for SQL? >>>>> >>>>> >>>>> Sorry for the confusion I caused. By DataStream jobs, I mean jobs >>>>> >>>>> submitted >>>>> >>>>> via Flink CLI which actually could be DataStream/Table jobs. >>>>> >>>>> I think a default main class would be user-friendly which eliminates >>>>> >>>>> the >>>>> >>>>> need >>>>> for users to write a main class as SQLRunner in Flink K8s operator >>>>> >>>>> [1]. >>>>> >>>>> >>>>> I thought the proposed SqlDriver was a dedicated main class >>>>> >>>>> accepting >>>>> >>>>> SQL files, is >>>>> >>>>> that correct? >>>>> >>>>> >>>>> Both JSON plans and SQL files are accepted. SQL Gateway should use >>>>> >>>>> JSON >>>>> >>>>> plans, >>>>> while CLI users may use either JSON plans or SQL files. >>>>> >>>>> Please see the updated FLIP[2] for more details. >>>>> >>>>> Personally, I prefer the way of init containers which doesn't >>>>> >>>>> depend >>>>> >>>>> on >>>>> >>>>> additional components. >>>>> This can reduce the moving parts of a production environment. >>>>> Depending on a distributed file system makes the testing, demo, and >>>>> >>>>> local >>>>> >>>>> setup harder than init containers. >>>>> >>>>> >>>>> Please note that we could reuse the checkpoint storage like S3/HDFS, >>>>> >>>>> which >>>>> >>>>> should >>>>> be required to run Flink in production, so I guess that would be >>>>> acceptable for most >>>>> users. WDYT? >>>>> >>>>> WRT testing, demo, and local setups, I think we could support the >>>>> >>>>> local >>>>> >>>>> filesystem >>>>> scheme i.e. file://** <file:///**> <> as the state backends do. It works >>>>> as long as >>>>> >>>>> SQL >>>>> >>>>> Gateway >>>>> and JobManager(or SQL Driver) can access the resource directory >>>>> >>>>> (specified >>>>> >>>>> via >>>>> `sql-gateway.application.storage-dir`). >>>>> >>>>> Thanks! >>>>> >>>>> [1] >>>>> >>>>> >>>>> >>>>> >>>>> >> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java >> >> <https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java> >> < >> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java >> >> <https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java> >>> >>>>> >>>>> [2] >>>>> >>>>> >>>>> >>>>> >>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >> >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver> >> < >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >> >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver> >>> >>>>> >>>>> [3] >>>>> >>>>> >>>>> >>>>> >>>>> >> https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161 >> >> <https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161> >> < >> https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161 >> >> <https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161> >>> >>>>> >>>>> >>>>> Best, >>>>> Paul Lam >>>>> >>>>> 2023年6月3日 12:21,Jark Wu <[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>> >> 写道: >>>>> >>>>> Hi Paul, >>>>> >>>>> Thanks for your reply. I left my comments inline. >>>>> >>>>> As the FLIP said, it’s good to have a default main class for Flink >>>>> >>>>> SQLs, >>>>> >>>>> which allows users to submit Flink SQLs in the same way as >>>>> >>>>> DataStream >>>>> >>>>> jobs, or else users need to write their own main class. >>>>> >>>>> >>>>> Isn't Table API the same way as DataSream jobs to submit Flink SQL? >>>>> DataStream API also doesn't provide a default main class for users, >>>>> why do we need to provide such one for SQL? >>>>> >>>>> With the help of ExecNodeGraph, do we still need the serialized >>>>> SessionState? If not, we could make SQL Driver accepts two >>>>> >>>>> serialized >>>>> >>>>> formats: >>>>> >>>>> >>>>> No, ExecNodeGraph doesn't need to serialize SessionState. I thought >>>>> >>>>> the >>>>> >>>>> proposed SqlDriver was a dedicated main class accepting SQL files, >>>>> >>>>> is >>>>> >>>>> that correct? >>>>> If true, we have to ship the SessionState for this case which is a >>>>> >>>>> large >>>>> >>>>> work. >>>>> I think we just need a JsonPlanDriver which is a main class that >>>>> >>>>> accepts >>>>> >>>>> JsonPlan as the parameter. >>>>> >>>>> >>>>> The common solutions I know is to use distributed file systems or >>>>> >>>>> use >>>>> >>>>> init containers to localize the resources. >>>>> >>>>> >>>>> Personally, I prefer the way of init containers which doesn't >>>>> >>>>> depend >>>>> >>>>> on >>>>> >>>>> additional components. >>>>> This can reduce the moving parts of a production environment. >>>>> Depending on a distributed file system makes the testing, demo, and >>>>> >>>>> local >>>>> >>>>> setup harder than init containers. >>>>> >>>>> Best, >>>>> Jark >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, 2 Jun 2023 at 18:10, Paul Lam <[email protected] >>>>> <mailto:[email protected]> <mailto: >> [email protected] <mailto:[email protected]>> >>>>> >>>>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> <mailto: >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>>> wrote: >>>>> >>>>> >>>>> The FLIP is in the early phase and some details are not included, >>>>> >>>>> but >>>>> >>>>> fortunately, we got lots of valuable ideas from the discussion. >>>>> >>>>> Thanks to everyone who joined the dissuasion! >>>>> @Weihua @Shanmon @Shengkai @Biao @Jark >>>>> >>>>> This weekend I’m gonna revisit and update the FLIP, adding more >>>>> details. Hopefully, we can further align our opinions. >>>>> >>>>> Best, >>>>> Paul Lam >>>>> >>>>> 2023年6月2日 18:02,Paul Lam <[email protected] >>>>> <mailto:[email protected]> <mailto: >> [email protected] <mailto:[email protected]>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>>> 写道: >>>>> >>>>> >>>>> Hi Jark, >>>>> >>>>> Thanks a lot for your input! >>>>> >>>>> If we decide to submit ExecNodeGraph instead of SQL file, is it >>>>> >>>>> still >>>>> >>>>> necessary to support SQL Driver? >>>>> >>>>> >>>>> I think so. Apart from usage in SQL Gateway, SQL Driver could >>>>> >>>>> simplify >>>>> >>>>> Flink SQL execution with Flink CLI. >>>>> >>>>> As the FLIP said, it’s good to have a default main class for >>>>> >>>>> Flink >>>>> >>>>> SQLs, >>>>> >>>>> which allows users to submit Flink SQLs in the same way as >>>>> >>>>> DataStream >>>>> >>>>> jobs, or else users need to write their own main class. >>>>> >>>>> SQL Driver needs to serialize SessionState which is very >>>>> >>>>> challenging >>>>> >>>>> but not detailed covered in the FLIP. >>>>> >>>>> >>>>> With the help of ExecNodeGraph, do we still need the serialized >>>>> SessionState? If not, we could make SQL Driver accepts two >>>>> >>>>> serialized >>>>> >>>>> formats: >>>>> >>>>> - SQL files for user-facing public usage >>>>> - ExecNodeGraph for internal usage >>>>> >>>>> It’s kind of similar to the relationship between job jars and >>>>> >>>>> jobgraphs. >>>>> >>>>> >>>>> Regarding "K8S doesn't support shipping multiple jars", is that >>>>> >>>>> true? >>>>> >>>>> Is it >>>>> >>>>> possible to support it? >>>>> >>>>> >>>>> Yes, K8s doesn’t distribute any files. It’s the users’ >>>>> >>>>> responsibility >>>>> >>>>> to >>>>> >>>>> make >>>>> >>>>> sure the resources are accessible in the containers. The common >>>>> >>>>> solutions >>>>> >>>>> I know is to use distributed file systems or use init containers >>>>> >>>>> to >>>>> >>>>> localize the >>>>> >>>>> resources. >>>>> >>>>> Now I lean toward introducing a fs to do the distribution job. >>>>> >>>>> WDYT? >>>>> >>>>> >>>>> Best, >>>>> Paul Lam >>>>> >>>>> 2023年6月1日 20:33,Jark Wu <[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> >> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> <mailto:[email protected] >>>>> <mailto:[email protected]>>> >>>>> >>>>> <mailto:[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> <mailto: >> [email protected] <mailto:[email protected]> <mailto:[email protected] >> <mailto:[email protected]>>>> >>>>> >>>>> <mailto:[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> <mailto: >> [email protected] <mailto:[email protected]> <mailto:[email protected] >> <mailto:[email protected]>>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> <mailto:[email protected] >>>>> <mailto:[email protected]>> <mailto:[email protected] >>>>> <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>>>> >>>>> >>>>> 写道: >>>>> >>>>> >>>>> Hi Paul, >>>>> >>>>> Thanks for starting this discussion. I like the proposal! This >>>>> >>>>> is >>>>> >>>>> a >>>>> >>>>> frequently requested feature! >>>>> >>>>> I agree with Shengkai that ExecNodeGraph as the submission >>>>> >>>>> object >>>>> >>>>> is a >>>>> >>>>> better idea than SQL file. To be more specific, it should be >>>>> >>>>> JsonPlanGraph >>>>> >>>>> or CompiledPlan which is the serializable representation. >>>>> >>>>> CompiledPlan >>>>> >>>>> is a >>>>> >>>>> clear separation between compiling/optimization/validation and >>>>> >>>>> execution. >>>>> >>>>> This can keep the validation and metadata accessing still on the >>>>> >>>>> SQLGateway >>>>> >>>>> side. This allows SQLGateway to leverage some metadata caching >>>>> >>>>> and >>>>> >>>>> UDF >>>>> >>>>> JAR >>>>> >>>>> caching for better compiling performance. >>>>> >>>>> If we decide to submit ExecNodeGraph instead of SQL file, is it >>>>> >>>>> still >>>>> >>>>> necessary to support SQL Driver? Regarding non-interactive SQL >>>>> >>>>> jobs, >>>>> >>>>> users >>>>> >>>>> can use the Table API program for application mode. SQL Driver >>>>> >>>>> needs >>>>> >>>>> to >>>>> >>>>> serialize SessionState which is very challenging but not >>>>> >>>>> detailed >>>>> >>>>> covered >>>>> >>>>> in the FLIP. >>>>> >>>>> Regarding "K8S doesn't support shipping multiple jars", is that >>>>> >>>>> true? >>>>> >>>>> Is it >>>>> >>>>> possible to support it? >>>>> >>>>> Best, >>>>> Jark >>>>> >>>>> >>>>> >>>>> On Thu, 1 Jun 2023 at 16:58, Paul Lam <[email protected] >>>>> <mailto:[email protected]> <mailto: >> [email protected] <mailto:[email protected]>> >>>>> >>>>> <mailto:[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>> >> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> <mailto: >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> <mailto: >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> <mailto: >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>>>> wrote: >>>>> >>>>> >>>>> Hi Weihua, >>>>> >>>>> You’re right. Distributing the SQLs to the TMs is one of the >>>>> >>>>> challenging >>>>> >>>>> parts of this FLIP. >>>>> >>>>> Web submission is not enabled in application mode currently as >>>>> >>>>> you >>>>> >>>>> said, >>>>> >>>>> but it could be changed if we have good reasons. >>>>> >>>>> What do you think about introducing a distributed storage for >>>>> >>>>> SQL >>>>> >>>>> Gateway? >>>>> >>>>> >>>>> We could make use of Flink file systems [1] to distribute the >>>>> >>>>> SQL >>>>> >>>>> Gateway >>>>> >>>>> generated resources, that should solve the problem at its root >>>>> >>>>> cause. >>>>> >>>>> >>>>> Users could specify Flink-supported file systems to ship files. >>>>> >>>>> It’s >>>>> >>>>> only >>>>> >>>>> required when using SQL Gateway with K8s application mode. >>>>> >>>>> [1] >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >>>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>>> >>>>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >>>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >> < >> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/ >> >> <https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/> >>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Best, >>>>> Paul Lam >>>>> >>>>> 2023年6月1日 13:55,Weihua Hu <[email protected] >>>>> <mailto:[email protected]> <mailto: >> [email protected] <mailto:[email protected]>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> <mailto: >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>>> 写道: >>>>> >>>>> >>>>> Thanks Paul for your reply. >>>>> >>>>> SQLDriver looks good to me. >>>>> >>>>> 2. Do you mean a pass the SQL string a configuration or a >>>>> >>>>> program >>>>> >>>>> argument? >>>>> >>>>> >>>>> >>>>> I brought this up because we were unable to pass the SQL file >>>>> >>>>> to >>>>> >>>>> Flink >>>>> >>>>> using Kubernetes mode. >>>>> For DataStream/Python users, they need to prepare their images >>>>> >>>>> for >>>>> >>>>> the >>>>> >>>>> jars >>>>> >>>>> and dependencies. >>>>> But for SQL users, they can use a common image to run >>>>> >>>>> different >>>>> >>>>> SQL >>>>> >>>>> queries >>>>> >>>>> if there are no other udf requirements. >>>>> It would be great if the SQL query and image were not bound. >>>>> >>>>> Using strings is a way to decouple these, but just as you >>>>> >>>>> mentioned, >>>>> >>>>> it's >>>>> >>>>> not easy to pass complex SQL. >>>>> >>>>> use web submission >>>>> >>>>> AFAIK, we can not use web submission in the Application mode. >>>>> >>>>> Please >>>>> >>>>> correct me if I'm wrong. >>>>> >>>>> >>>>> Best, >>>>> Weihua >>>>> >>>>> >>>>> On Wed, May 31, 2023 at 9:37 PM Paul Lam < >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>> >>>>> <mailto:[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>> >>>>> >>>>> <mailto:[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> <mailto: >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>>> >>>>> >>>>> wrote: >>>>> >>>>> >>>>> Hi Biao, >>>>> >>>>> Thanks for your comments! >>>>> >>>>> 1. Scope: is this FLIP only targeted for non-interactive >>>>> >>>>> Flink >>>>> >>>>> SQL >>>>> >>>>> jobs >>>>> >>>>> in >>>>> >>>>> Application mode? More specifically, if we use SQL >>>>> >>>>> client/gateway >>>>> >>>>> to >>>>> >>>>> execute some interactive SQLs like a SELECT query, can we >>>>> >>>>> ask >>>>> >>>>> flink >>>>> >>>>> to >>>>> >>>>> use >>>>> >>>>> Application mode to execute those queries after this FLIP? >>>>> >>>>> >>>>> Thanks for pointing it out. I think only DMLs would be >>>>> >>>>> executed >>>>> >>>>> via >>>>> >>>>> SQL >>>>> >>>>> Driver. >>>>> I'll add the scope to the FLIP. >>>>> >>>>> 2. Deployment: I believe in YARN mode, the implementation is >>>>> >>>>> trivial as >>>>> >>>>> we >>>>> >>>>> can ship files via YARN's tool easily but for K8s, things >>>>> >>>>> can >>>>> >>>>> be >>>>> >>>>> more >>>>> >>>>> complicated as Shengkai said. >>>>> >>>>> >>>>> >>>>> Your input is very informative. I’m thinking about using web >>>>> >>>>> submission, >>>>> >>>>> but it requires exposing the JobManager port which could also >>>>> >>>>> be >>>>> >>>>> a >>>>> >>>>> problem >>>>> >>>>> on K8s. >>>>> >>>>> Another approach is to explicitly require a distributed >>>>> >>>>> storage >>>>> >>>>> to >>>>> >>>>> ship >>>>> >>>>> files, >>>>> but we may need a new deployment executor for that. >>>>> >>>>> What do you think of these two approaches? >>>>> >>>>> 3. Serialization of SessionState: in SessionState, there are >>>>> >>>>> some >>>>> >>>>> unserializable fields >>>>> like >>>>> >>>>> org.apache.flink.table.resource.ResourceManager#userClassLoader. >>>>> >>>>> It >>>>> >>>>> may be worthwhile to add more details about the >>>>> >>>>> serialization >>>>> >>>>> part. >>>>> >>>>> >>>>> I agree. That’s a missing part. But if we use ExecNodeGraph >>>>> >>>>> as >>>>> >>>>> Shengkai >>>>> >>>>> mentioned, do we eliminate the need for serialization of >>>>> >>>>> SessionState? >>>>> >>>>> >>>>> Best, >>>>> Paul Lam >>>>> >>>>> 2023年5月31日 13:07,Biao Geng <[email protected] >>>>> <mailto:[email protected]> <mailto: >> [email protected] <mailto:[email protected]>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> <mailto: >> [email protected] <mailto:[email protected]> <mailto:[email protected] >> <mailto:[email protected]>>>>> 写道: >>>>> >>>>> >>>>> Thanks Paul for the proposal!I believe it would be very >>>>> >>>>> useful >>>>> >>>>> for >>>>> >>>>> flink >>>>> >>>>> users. >>>>> After reading the FLIP, I have some questions: >>>>> 1. Scope: is this FLIP only targeted for non-interactive >>>>> >>>>> Flink >>>>> >>>>> SQL >>>>> >>>>> jobs >>>>> >>>>> in >>>>> >>>>> Application mode? More specifically, if we use SQL >>>>> >>>>> client/gateway >>>>> >>>>> to >>>>> >>>>> execute some interactive SQLs like a SELECT query, can we >>>>> >>>>> ask >>>>> >>>>> flink >>>>> >>>>> to >>>>> >>>>> use >>>>> >>>>> Application mode to execute those queries after this FLIP? >>>>> 2. Deployment: I believe in YARN mode, the implementation is >>>>> >>>>> trivial as >>>>> >>>>> we >>>>> >>>>> can ship files via YARN's tool easily but for K8s, things >>>>> >>>>> can >>>>> >>>>> be >>>>> >>>>> more >>>>> >>>>> complicated as Shengkai said. I have implemented a simple >>>>> >>>>> POC >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >> < >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >> < >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >>> >>>>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >>>>> >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >> < >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >> < >> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133 >> >> <https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133> >>> >>>>> >>>>> >>>>> >>>>> >>>>> based on SQL client before(i.e. consider the SQL client >>>>> >>>>> which >>>>> >>>>> supports >>>>> >>>>> executing a SQL file as the SQL driver in this FLIP). One >>>>> >>>>> problem >>>>> >>>>> I >>>>> >>>>> have >>>>> >>>>> met is how do we ship SQL files ( or Job Graph) to the k8s >>>>> >>>>> side. >>>>> >>>>> Without >>>>> >>>>> such support, users have to modify the initContainer or >>>>> >>>>> rebuild >>>>> >>>>> a >>>>> >>>>> new >>>>> >>>>> K8s >>>>> >>>>> image every time to fetch the SQL file. Like the flink k8s >>>>> >>>>> operator, >>>>> >>>>> one >>>>> >>>>> workaround is to utilize the flink config(transforming the >>>>> >>>>> SQL >>>>> >>>>> file >>>>> >>>>> to >>>>> >>>>> a >>>>> >>>>> escaped string like Weihua mentioned) which will be >>>>> >>>>> converted >>>>> >>>>> to a >>>>> >>>>> ConfigMap but K8s has size limit of ConfigMaps(no larger >>>>> >>>>> than >>>>> >>>>> 1MB >>>>> >>>>> < >>>>> >>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>> <https://kubernetes.io/docs/concepts/configuration/configmap/> < >> https://kubernetes.io/docs/concepts/configuration/configmap/ >> <https://kubernetes.io/docs/concepts/configuration/configmap/>> >>>>> >>>>> < >>>>> >>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>> <https://kubernetes.io/docs/concepts/configuration/configmap/> < >> https://kubernetes.io/docs/concepts/configuration/configmap/ >> <https://kubernetes.io/docs/concepts/configuration/configmap/>>> < >>>>> >>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>> <https://kubernetes.io/docs/concepts/configuration/configmap/> < >> https://kubernetes.io/docs/concepts/configuration/configmap/ >> <https://kubernetes.io/docs/concepts/configuration/configmap/>> < >>>>> >>>>> https://kubernetes.io/docs/concepts/configuration/configmap/ >>>>> <https://kubernetes.io/docs/concepts/configuration/configmap/> < >> https://kubernetes.io/docs/concepts/configuration/configmap/ >> <https://kubernetes.io/docs/concepts/configuration/configmap/>>>>>). >>>>> >>>>> Not >>>>> >>>>> sure >>>>> >>>>> if we have better solutions. >>>>> 3. Serialization of SessionState: in SessionState, there are >>>>> >>>>> some >>>>> >>>>> unserializable fields >>>>> like >>>>> >>>>> org.apache.flink.table.resource.ResourceManager#userClassLoader. >>>>> >>>>> It >>>>> >>>>> may be worthwhile to add more details about the >>>>> >>>>> serialization >>>>> >>>>> part. >>>>> >>>>> >>>>> Best, >>>>> Biao Geng >>>>> >>>>> Paul Lam <[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> >> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>> >>>>> >>>>> <mailto:[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> <mailto: >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>>> >>>>> >>>>> 于2023年5月31日周三 11:49写道: >>>>> >>>>> >>>>> Hi Weihua, >>>>> >>>>> Thanks a lot for your input! Please see my comments inline. >>>>> >>>>> - Is SQLRunner the better name? We use this to run a SQL >>>>> >>>>> Job. >>>>> >>>>> (Not >>>>> >>>>> strong, >>>>> >>>>> the SQLDriver is fine for me) >>>>> >>>>> >>>>> I’ve thought about SQL Runner but picked SQL Driver for the >>>>> >>>>> following >>>>> >>>>> reasons FYI: >>>>> >>>>> 1. I have a PythonDriver doing the same job for PyFlink [1] >>>>> 2. Flink program's main class is sort of like Driver in >>>>> >>>>> JDBC >>>>> >>>>> which >>>>> >>>>> translates SQLs into >>>>> databases specific languages. >>>>> >>>>> In general, I’m +1 for SQL Driver and +0 for SQL Runner. >>>>> >>>>> - Could we run SQL jobs using SQL in strings? Otherwise, >>>>> >>>>> we >>>>> >>>>> need >>>>> >>>>> to >>>>> >>>>> prepare >>>>> >>>>> a SQL file in an image for Kubernetes application mode, >>>>> >>>>> which >>>>> >>>>> may >>>>> >>>>> be >>>>> >>>>> a >>>>> >>>>> bit >>>>> >>>>> cumbersome. >>>>> >>>>> >>>>> Do you mean a pass the SQL string a configuration or a >>>>> >>>>> program >>>>> >>>>> argument? >>>>> >>>>> >>>>> I thought it might be convenient for testing propose, but >>>>> >>>>> not >>>>> >>>>> recommended >>>>> >>>>> for production, >>>>> cause Flink SQLs could be complicated and involves lots of >>>>> >>>>> characters >>>>> >>>>> that >>>>> >>>>> need to escape. >>>>> >>>>> WDYT? >>>>> >>>>> - I noticed that we don't specify the SQLDriver jar in the >>>>> >>>>> "run-application" >>>>> >>>>> command. Does that mean we need to perform automatic >>>>> >>>>> detection >>>>> >>>>> in >>>>> >>>>> Flink? >>>>> >>>>> >>>>> Yes! It’s like running a PyFlink job with the following >>>>> >>>>> command: >>>>> >>>>> >>>>> ``` >>>>> ./bin/flink run \ >>>>> --pyModule table.word_count \ >>>>> --pyFiles examples/python/table >>>>> ``` >>>>> >>>>> The CLI determines if it’s a SQL job, if yes apply the SQL >>>>> >>>>> Driver >>>>> >>>>> automatically. >>>>> >>>>> >>>>> [1] >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >> < >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >> < >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >>> >>>>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >>>>> >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >> < >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >> < >> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java >> >> <https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java> >>> >>>>> >>>>> >>>>> >>>>> >>>>> Best, >>>>> Paul Lam >>>>> >>>>> 2023年5月30日 21:56,Weihua Hu <[email protected] >>>>> <mailto:[email protected]> <mailto: >> [email protected] <mailto:[email protected]>> >>>>> >>>>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>> <mailto: >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>>> 写道: >>>>> >>>>> >>>>> Thanks Paul for the proposal. >>>>> >>>>> +1 for this. It is valuable in improving ease of use. >>>>> >>>>> I have a few questions. >>>>> - Is SQLRunner the better name? We use this to run a SQL >>>>> >>>>> Job. >>>>> >>>>> (Not >>>>> >>>>> strong, >>>>> >>>>> the SQLDriver is fine for me) >>>>> - Could we run SQL jobs using SQL in strings? Otherwise, >>>>> >>>>> we >>>>> >>>>> need >>>>> >>>>> to >>>>> >>>>> prepare >>>>> >>>>> a SQL file in an image for Kubernetes application mode, >>>>> >>>>> which >>>>> >>>>> may >>>>> >>>>> be >>>>> >>>>> a >>>>> >>>>> bit >>>>> >>>>> cumbersome. >>>>> - I noticed that we don't specify the SQLDriver jar in the >>>>> >>>>> "run-application" >>>>> >>>>> command. Does that mean we need to perform automatic >>>>> >>>>> detection >>>>> >>>>> in >>>>> >>>>> Flink? >>>>> >>>>> >>>>> >>>>> Best, >>>>> Weihua >>>>> >>>>> >>>>> On Mon, May 29, 2023 at 7:24 PM Paul Lam < >>>>> >>>>> [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>> >>>>> <mailto:[email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> <mailto: >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>>> >>>>> >>>>> wrote: >>>>> >>>>> >>>>> Hi team, >>>>> >>>>> I’d like to start a discussion about FLIP-316 [1], which >>>>> >>>>> introduces >>>>> >>>>> a >>>>> >>>>> SQL >>>>> >>>>> driver as the >>>>> default main class for Flink SQL jobs. >>>>> >>>>> Currently, Flink SQL could be executed out of the box >>>>> >>>>> either >>>>> >>>>> via >>>>> >>>>> SQL >>>>> >>>>> Client/Gateway >>>>> or embedded in a Flink Java/Python program. >>>>> >>>>> However, each one has its drawback: >>>>> >>>>> - SQL Client/Gateway doesn’t support the application >>>>> >>>>> deployment >>>>> >>>>> mode >>>>> >>>>> [2] >>>>> >>>>> - Flink Java/Python program requires extra work to write >>>>> >>>>> a >>>>> >>>>> non-SQL >>>>> >>>>> program >>>>> >>>>> >>>>> Therefore, I propose adding a SQL driver to act as the >>>>> >>>>> default >>>>> >>>>> main >>>>> >>>>> class >>>>> >>>>> for SQL jobs. >>>>> Please see the FLIP docs for details and feel free to >>>>> >>>>> comment. >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> [1] >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver >> >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver> >> < >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver >> >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver> >>> >>>>> >>>>> < >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >> >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver> >> < >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver >> >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver> >>> >>>>> >>>>> >>>>> [2] https://issues.apache.org/jira/browse/FLINK-26541 >>>>> <https://issues.apache.org/jira/browse/FLINK-26541> < >> https://issues.apache.org/jira/browse/FLINK-26541 >> <https://issues.apache.org/jira/browse/FLINK-26541>> < >>>>> https://issues.apache.org/jira/browse/FLINK-26541 >>>>> <https://issues.apache.org/jira/browse/FLINK-26541> < >> https://issues.apache.org/jira/browse/FLINK-26541 >> <https://issues.apache.org/jira/browse/FLINK-26541>>> >>>>> >>>>> Best, >>>>> Paul Lam
