Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Yang Wang Sun, 11 Jun 2023 20:41:15 -0700

Sorry for the late reply. I am in favor of introducing such a built-in
resource localization mechanism
based on Flink FileSystem. Then FLINK-28915[1] could be the second step
which will download
the jars and dependencies to the JobManager/TaskManager local directory
before working.


The first step could be done in another ticket in Flink. Or some external
Flink jobs management system
could also take care of this.

[1]. https://issues.apache.org/jira/browse/FLINK-28915

Best,
Yang

Paul Lam <paullin3...@gmail.com> 于2023年6月9日周五 17:39写道：

> Hi Mason,
>
> I get your point. I'm increasingly feeling the need to introduce a
> built-in
> file distribution mechanism for flink-kubernetes module, just like Spark
> does with `spark.kubernetes.file.upload.path` [1].
>
> I’m assuming the workflow is as follows:
>
> - KubernetesClusterDescripter uploads all local resources to a remote
>   storage via Flink filesystem (skips if the resources are already remote).
> - KubernetesApplicationClusterEntrypoint downloads the resources
>   and put them in the classpath during startup.
>
> I wouldn't mind splitting it into another FLIP to ensure that everything is
> done correctly.
>
> cc'ed @Yang to gather more opinions.
>
> [1]
> https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management
>
> Best,
> Paul Lam
>
> 2023年6月8日 12:15，Mason Chen <mas.chen6...@gmail.com> 写道：
>
> Hi Paul,
>
> Thanks for your response!
>
> I agree that utilizing SQL Drivers in Java applications is equally
> important
>
> as employing them in SQL Gateway. WRT init containers, I think most
> users use them just as a workaround. For example, wget a jar from the
> maven repo.
>
> We could implement the functionality in SQL Driver in a more graceful
> way and the flink-supported filesystem approach seems to be a
> good choice.
>
>
> My main point is: can we solve the problem with a design agnostic of SQL
> and Stream API? I mentioned a use case where this ability is useful for
> Java or Stream API applications. Maybe this is even a non-goal to your FLIP
> since you are focusing on the driver entrypoint.
>
> Jark mentioned some optimizations:
>
> This allows SQLGateway to leverage some metadata caching and UDF JAR
> caching for better compiling performance.
>
> It would be great to see this even outside the SQLGateway (i.e. UDF JAR
> caching).
>
> Best,
> Mason
>
> On Wed, Jun 7, 2023 at 2:26 AM Shengkai Fang <fskm...@gmail.com> wrote:
>
> Hi. Paul.  Thanks for your update and the update makes me understand the
> design much better.
>
> But I still have some questions about the FLIP.
>
> For SQL Gateway, only DMLs need to be delegated to the SQL server
> Driver. I would think about the details and update the FLIP. Do you have
>
> some
>
> ideas already?
>
>
> If the applicaiton mode can not support library mode, I think we should
> only execute INSERT INTO and UPDATE/ DELETE statement in the application
> mode. AFAIK, we can not support ANALYZE TABLE and CALL PROCEDURE
> statements. The ANALYZE TABLE syntax need to register the statistic to the
> catalog after job finishes and the CALL PROCEDURE statement doesn't
> generate the ExecNodeGraph.
>
> * Introduce storage via option `sql-gateway.application.storage-dir`
>
> If we can not support to submit the jars through web submission, +1 to
> introduce the options to upload the files. While I think the uploader
> should be responsible to remove the uploaded jars. Can we remove the jars
> if the job is running or gateway exits?
>
> * JobID is not avaliable
>
> Can we use the returned rest client by ApplicationDeployer to query the job
> id? I am concerned that users don't know which job is related to the
> submitted SQL.
>
> * Do we need to introduce a new module named flink-table-sql-runner?
>
> It seems we need to introduce a new module. Will the new module is
> available in the distribution package? I agree with Jark that we don't need
> to introduce this for table-API users and these users have their main
> class. If we want to make users write the k8s operator more easily, I think
> we should modify the k8s operator repo. If we don't need to support SQL
> files, can we make this jar only visible in the sql-gateway like we do in
> the planner loader?[1]
>
> [1]
>
>
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95
>
> Best,
> Shengkai
>
>
>
>
>
>
>
>
> Weihua Hu <huweihua....@gmail.com> 于2023年6月7日周三 10:52写道：
>
> Hi,
>
> Thanks for updating the FLIP.
>
> I have two cents on the distribution of SQLs and resources.
> 1. Should we support a common file distribution mechanism for k8s
> application mode?
>  I have seen some issues and requirements on the mailing list.
>  In our production environment, we implement the download command in the
> CliFrontend.
>  And automatically add an init container to the POD for file
>
> downloading.
>
> The advantage of this
>  is that we can use all Flink-supported file systems to store files.
>
>  This need more discussion. I would appreciate hearing more opinions.
>
> 2. In this FLIP, we distribute files in two different ways in YARN and
> Kubernetes. Can we combine it in one way?
>  If we don't want to implement a common file distribution for k8s
> application mode. Could we use the SQLDriver
>  to download the files both in YARN and K8S? IMO, this can reduce the
>
> cost
>
> of code maintenance.
>
> Best,
> Weihua
>
>
> On Wed, Jun 7, 2023 at 10:18 AM Paul Lam <paullin3...@gmail.com> wrote:
>
> Hi Mason,
>
> Thanks for your input!
>
> +1 for init containers or a more generalized way of obtaining
>
> arbitrary
>
> files. File fetching isn't specific to just SQL--it also matters for
>
> Java
>
> applications if the user doesn't want to rebuild a Flink image and
>
> just
>
> wants to modify the user application fat jar.
>
>
> I agree that utilizing SQL Drivers in Java applications is equally
> important
> as employing them in SQL Gateway. WRT init containers, I think most
> users use them just as a workaround. For example, wget a jar from the
> maven repo.
>
> We could implement the functionality in SQL Driver in a more graceful
> way and the flink-supported filesystem approach seems to be a
> good choice.
>
> Also, what do you think about prefixing the config options with
> `sql-driver` instead of just `sql` to be more specific?
>
>
> LGTM, since SQL Driver is a public interface and the options are
> specific to it.
>
> Best,
> Paul Lam
>
> 2023年6月6日 06:30，Mason Chen <mas.chen6...@gmail.com> 写道：
>
> Hi Paul,
>
> +1 for this feature and supporting SQL file + JSON plans. We get a
>
> lot
>
> of
>
> requests to just be able to submit a SQL file, but the JSON plan
> optimizations make sense.
>
> +1 for init containers or a more generalized way of obtaining
>
> arbitrary
>
> files. File fetching isn't specific to just SQL--it also matters for
>
> Java
>
> applications if the user doesn't want to rebuild a Flink image and
>
> just
>
> wants to modify the user application fat jar.
>
> Please note that we could reuse the checkpoint storage like S3/HDFS,
>
> which
>
> should
>
>
> be required to run Flink in production, so I guess that would be
>
> acceptable
>
> for most
>
>
> users. WDYT?
>
>
> If you do go this route, it would be nice to support writing these
>
> files
>
> to
>
> S3/HDFS via Flink. This makes access control and policy management
>
> simpler.
>
>
> Also, what do you think about prefixing the config options with
> `sql-driver` instead of just `sql` to be more specific?
>
> Best,
> Mason
>
> On Mon, Jun 5, 2023 at 2:28 AM Paul Lam <paullin3...@gmail.com
>
> <mailto:
>
> paullin3...@gmail.com>> wrote:
>
>
> Hi Jark,
>
> Thanks for your input! Please see my comments inline.
>
> Isn't Table API the same way as DataSream jobs to submit Flink SQL?
> DataStream API also doesn't provide a default main class for users,
> why do we need to provide such one for SQL?
>
>
> Sorry for the confusion I caused. By DataStream jobs, I mean jobs
>
> submitted
>
> via Flink CLI which actually could be DataStream/Table jobs.
>
> I think a default main class would be user-friendly which eliminates
>
> the
>
> need
> for users to write a main class as SQLRunner in Flink K8s operator
>
> [1].
>
>
> I thought the proposed SqlDriver was a dedicated main class
>
> accepting
>
> SQL files, is
>
> that correct?
>
>
> Both JSON plans and SQL files are accepted. SQL Gateway should use
>
> JSON
>
> plans,
> while CLI users may use either JSON plans or SQL files.
>
> Please see the updated FLIP[2] for more details.
>
> Personally, I prefer the way of init containers which doesn't
>
> depend
>
> on
>
> additional components.
> This can reduce the moving parts of a production environment.
> Depending on a distributed file system makes the testing, demo, and
>
> local
>
> setup harder than init containers.
>
>
> Please note that we could reuse the checkpoint storage like S3/HDFS,
>
> which
>
> should
> be required to run Flink in production, so I guess that would be
> acceptable for most
> users. WDYT?
>
> WRT testing, demo, and local setups, I think we could support the
>
> local
>
> filesystem
> scheme i.e. file://** as the state backends do. It works as long as
>
> SQL
>
> Gateway
> and JobManager(or SQL Driver) can access the resource directory
>
> (specified
>
> via
> `sql-gateway.application.storage-dir`).
>
> Thanks!
>
> [1]
>
>
>
>
> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java
>
> [2]
>
>
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
>
> [3]
>
>
>
>
> https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161
>
>
> Best,
> Paul Lam
>
> 2023年6月3日 12:21，Jark Wu <imj...@gmail.com> 写道：
>
> Hi Paul,
>
> Thanks for your reply. I left my comments inline.
>
> As the FLIP said, it’s good to have a default main class for Flink
>
> SQLs,
>
> which allows users to submit Flink SQLs in the same way as
>
> DataStream
>
> jobs, or else users need to write their own main class.
>
>
> Isn't Table API the same way as DataSream jobs to submit Flink SQL?
> DataStream API also doesn't provide a default main class for users,
> why do we need to provide such one for SQL?
>
> With the help of ExecNodeGraph, do we still need the serialized
> SessionState? If not, we could make SQL Driver accepts two
>
> serialized
>
> formats:
>
>
> No, ExecNodeGraph doesn't need to serialize SessionState. I thought
>
> the
>
> proposed SqlDriver was a dedicated main class accepting SQL files,
>
> is
>
> that correct?
> If true, we have to ship the SessionState for this case which is a
>
> large
>
> work.
> I think we just need a JsonPlanDriver which is a main class that
>
> accepts
>
> JsonPlan as the parameter.
>
>
> The common solutions I know is to use distributed file systems or
>
> use
>
> init containers to localize the resources.
>
>
> Personally, I prefer the way of init containers which doesn't
>
> depend
>
> on
>
> additional components.
> This can reduce the moving parts of a production environment.
> Depending on a distributed file system makes the testing, demo, and
>
> local
>
> setup harder than init containers.
>
> Best,
> Jark
>
>
>
>
> On Fri, 2 Jun 2023 at 18:10, Paul Lam <paullin3...@gmail.com
>
> <mailto:
>
> paullin3...@gmail.com> <mailto:
>
> paullin3...@gmail.com <mailto:paullin3...@gmail.com>>> wrote:
>
>
> The FLIP is in the early phase and some details are not included,
>
> but
>
> fortunately, we got lots of valuable ideas from the discussion.
>
> Thanks to everyone who joined the dissuasion!
> @Weihua @Shanmon @Shengkai @Biao @Jark
>
> This weekend I’m gonna revisit and update the FLIP, adding more
> details. Hopefully, we can further align our opinions.
>
> Best,
> Paul Lam
>
> 2023年6月2日 18:02，Paul Lam <paullin3...@gmail.com <mailto:
>
> paullin3...@gmail.com>> 写道：
>
>
> Hi Jark,
>
> Thanks a lot for your input!
>
> If we decide to submit ExecNodeGraph instead of SQL file, is it
>
> still
>
> necessary to support SQL Driver?
>
>
> I think so. Apart from usage in SQL Gateway, SQL Driver could
>
> simplify
>
> Flink SQL execution with Flink CLI.
>
> As the FLIP said, it’s good to have a default main class for
>
> Flink
>
> SQLs,
>
> which allows users to submit Flink SQLs in the same way as
>
> DataStream
>
> jobs, or else users need to write their own main class.
>
> SQL Driver needs to serialize SessionState which is very
>
> challenging
>
> but not detailed covered in the FLIP.
>
>
> With the help of ExecNodeGraph, do we still need the serialized
> SessionState? If not, we could make SQL Driver accepts two
>
> serialized
>
> formats:
>
> - SQL files for user-facing public usage
> - ExecNodeGraph for internal usage
>
> It’s kind of similar to the relationship between job jars and
>
> jobgraphs.
>
>
> Regarding "K8S doesn't support shipping multiple jars", is that
>
> true?
>
> Is it
>
> possible to support it?
>
>
> Yes, K8s doesn’t distribute any files. It’s the users’
>
> responsibility
>
> to
>
> make
>
> sure the resources are accessible in the containers. The common
>
> solutions
>
> I know is to use distributed file systems or use init containers
>
> to
>
> localize the
>
> resources.
>
> Now I lean toward introducing a fs to do the distribution job.
>
> WDYT?
>
>
> Best,
> Paul Lam
>
> 2023年6月1日 20:33，Jark Wu <imj...@gmail.com <mailto:
>
> imj...@gmail.com>
>
> <mailto:imj...@gmail.com <mailto:imj...@gmail.com>>
>
> <mailto:imj...@gmail.com <mailto:imj...@gmail.com> <mailto:
>
> imj...@gmail.com <mailto:imj...@gmail.com>>>>
>
> 写道：
>
>
> Hi Paul,
>
> Thanks for starting this discussion. I like the proposal! This
>
> is
>
> a
>
> frequently requested feature!
>
> I agree with Shengkai that ExecNodeGraph as the submission
>
> object
>
> is a
>
> better idea than SQL file. To be more specific, it should be
>
> JsonPlanGraph
>
> or CompiledPlan which is the serializable representation.
>
> CompiledPlan
>
> is a
>
> clear separation between compiling/optimization/validation and
>
> execution.
>
> This can keep the validation and metadata accessing still on the
>
> SQLGateway
>
> side. This allows SQLGateway to leverage some metadata caching
>
> and
>
> UDF
>
> JAR
>
> caching for better compiling performance.
>
> If we decide to submit ExecNodeGraph instead of SQL file, is it
>
> still
>
> necessary to support SQL Driver? Regarding non-interactive SQL
>
> jobs,
>
> users
>
> can use the Table API program for application mode. SQL Driver
>
> needs
>
> to
>
> serialize SessionState which is very challenging but not
>
> detailed
>
> covered
>
> in the FLIP.
>
> Regarding "K8S doesn't support shipping multiple jars", is that
>
> true?
>
> Is it
>
> possible to support it?
>
> Best,
> Jark
>
>
>
> On Thu, 1 Jun 2023 at 16:58, Paul Lam <paullin3...@gmail.com
>
> <mailto:paullin3...@gmail.com> <mailto:
>
> paullin3...@gmail.com <mailto:paullin3...@gmail.com>> <mailto:
>
> paullin3...@gmail.com <mailto:paullin3...@gmail.com> <mailto:
>
> paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>> wrote:
>
>
> Hi Weihua,
>
> You’re right. Distributing the SQLs to the TMs is one of the
>
> challenging
>
> parts of this FLIP.
>
> Web submission is not enabled in application mode currently as
>
> you
>
> said,
>
> but it could be changed if we have good reasons.
>
> What do you think about introducing a distributed storage for
>
> SQL
>
> Gateway?
>
>
> We could make use of Flink file systems [1] to distribute the
>
> SQL
>
> Gateway
>
> generated resources, that should solve the problem at its root
>
> cause.
>
>
> Users could specify Flink-supported file systems to ship files.
>
> It’s
>
> only
>
> required when using SQL Gateway with K8s application mode.
>
> [1]
>
>
>
>
>
>
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>
> <
>
>
>
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>
>
> <
>
>
>
>
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>
> <
>
>
>
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>
>
>
> <
>
>
>
>
>
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>
> <
>
>
>
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>
>
> <
>
>
>
>
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>
> <
>
>
>
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>
>
>
>
>
> Best,
> Paul Lam
>
> 2023年6月1日 13:55，Weihua Hu <huweihua....@gmail.com <mailto:
>
> huweihua....@gmail.com> <mailto:
>
> huweihua....@gmail.com <mailto:huweihua....@gmail.com>>> 写道：
>
>
> Thanks Paul for your reply.
>
> SQLDriver looks good to me.
>
> 2. Do you mean a pass the SQL string a configuration or a
>
> program
>
> argument?
>
>
>
> I brought this up because we were unable to pass the SQL file
>
> to
>
> Flink
>
> using Kubernetes mode.
> For DataStream/Python users, they need to prepare their images
>
> for
>
> the
>
> jars
>
> and dependencies.
> But for SQL users, they can use a common image to run
>
> different
>
> SQL
>
> queries
>
> if there are no other udf requirements.
> It would be great if the SQL query and image were not bound.
>
> Using strings is a way to decouple these, but just as you
>
> mentioned,
>
> it's
>
> not easy to pass complex SQL.
>
> use web submission
>
> AFAIK, we can not use web submission in the Application mode.
>
> Please
>
> correct me if I'm wrong.
>
>
> Best,
> Weihua
>
>
> On Wed, May 31, 2023 at 9:37 PM Paul Lam <
>
> paullin3...@gmail.com
>
> <mailto:paullin3...@gmail.com>
>
> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>
>
> wrote:
>
>
> Hi Biao,
>
> Thanks for your comments!
>
> 1. Scope: is this FLIP only targeted for non-interactive
>
> Flink
>
> SQL
>
> jobs
>
> in
>
> Application mode? More specifically, if we use SQL
>
> client/gateway
>
> to
>
> execute some interactive SQLs like a SELECT query, can we
>
> ask
>
> flink
>
> to
>
> use
>
> Application mode to execute those queries after this FLIP?
>
>
> Thanks for pointing it out. I think only DMLs would be
>
> executed
>
> via
>
> SQL
>
> Driver.
> I'll add the scope to the FLIP.
>
> 2. Deployment: I believe in YARN mode, the implementation is
>
> trivial as
>
> we
>
> can ship files via YARN's tool easily but for K8s, things
>
> can
>
> be
>
> more
>
> complicated as Shengkai said.
>
>
>
> Your input is very informative. I’m thinking about using web
>
> submission,
>
> but it requires exposing the JobManager port which could also
>
> be
>
> a
>
> problem
>
> on K8s.
>
> Another approach is to explicitly require a distributed
>
> storage
>
> to
>
> ship
>
> files,
> but we may need a new deployment executor for that.
>
> What do you think of these two approaches?
>
> 3. Serialization of SessionState: in SessionState, there are
>
> some
>
> unserializable fields
> like
>
> org.apache.flink.table.resource.ResourceManager#userClassLoader.
>
> It
>
> may be worthwhile to add more details about the
>
> serialization
>
> part.
>
>
> I agree. That’s a missing part. But if we use ExecNodeGraph
>
> as
>
> Shengkai
>
> mentioned, do we eliminate the need for serialization of
>
> SessionState?
>
>
> Best,
> Paul Lam
>
> 2023年5月31日 13:07，Biao Geng <biaoge...@gmail.com <mailto:
>
> biaoge...@gmail.com> <mailto:
>
> biaoge...@gmail.com <mailto:biaoge...@gmail.com>>> 写道：
>
>
> Thanks Paul for the proposal!I believe it would be very
>
> useful
>
> for
>
> flink
>
> users.
> After reading the FLIP, I have some questions:
> 1. Scope: is this FLIP only targeted for non-interactive
>
> Flink
>
> SQL
>
> jobs
>
> in
>
> Application mode? More specifically, if we use SQL
>
> client/gateway
>
> to
>
> execute some interactive SQLs like a SELECT query, can we
>
> ask
>
> flink
>
> to
>
> use
>
> Application mode to execute those queries after this FLIP?
> 2. Deployment: I believe in YARN mode, the implementation is
>
> trivial as
>
> we
>
> can ship files via YARN's tool easily but for K8s, things
>
> can
>
> be
>
> more
>
> complicated as Shengkai said. I have implemented a simple
>
> POC
>
> <
>
>
>
>
>
>
>
>
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
>
> <
>
>
>
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
>
>
> <
>
>
>
>
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
>
> <
>
>
>
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
>
>
>
>
> based on SQL client before(i.e. consider the SQL client
>
> which
>
> supports
>
> executing a SQL file as the SQL driver in this FLIP). One
>
> problem
>
> I
>
> have
>
> met is how do we ship SQL files ( or Job Graph) to the k8s
>
> side.
>
> Without
>
> such support, users have to modify the initContainer or
>
> rebuild
>
> a
>
> new
>
> K8s
>
> image every time to fetch the SQL file. Like the flink k8s
>
> operator,
>
> one
>
> workaround is to utilize the flink config(transforming the
>
> SQL
>
> file
>
> to
>
> a
>
> escaped string like Weihua mentioned) which will be
>
> converted
>
> to a
>
> ConfigMap but K8s has size limit of ConfigMaps(no larger
>
> than
>
> 1MB
>
> <
>
> https://kubernetes.io/docs/concepts/configuration/configmap/
>
> <
>
> https://kubernetes.io/docs/concepts/configuration/configmap/> <
>
> https://kubernetes.io/docs/concepts/configuration/configmap/ <
>
> https://kubernetes.io/docs/concepts/configuration/configmap/>>>).
>
> Not
>
> sure
>
> if we have better solutions.
> 3. Serialization of SessionState: in SessionState, there are
>
> some
>
> unserializable fields
> like
>
> org.apache.flink.table.resource.ResourceManager#userClassLoader.
>
> It
>
> may be worthwhile to add more details about the
>
> serialization
>
> part.
>
>
> Best,
> Biao Geng
>
> Paul Lam <paullin3...@gmail.com <mailto:
>
> paullin3...@gmail.com
>
>
> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>
>
> 于2023年5月31日周三 11:49写道：
>
>
> Hi Weihua,
>
> Thanks a lot for your input! Please see my comments inline.
>
> - Is SQLRunner the better name? We use this to run a SQL
>
> Job.
>
> (Not
>
> strong,
>
> the SQLDriver is fine for me)
>
>
> I’ve thought about SQL Runner but picked SQL Driver for the
>
> following
>
> reasons FYI:
>
> 1. I have a PythonDriver doing the same job for PyFlink [1]
> 2. Flink program's main class is sort of like Driver in
>
> JDBC
>
> which
>
> translates SQLs into
> databases specific languages.
>
> In general, I’m +1 for SQL Driver and +0 for SQL Runner.
>
> - Could we run SQL jobs using SQL in strings? Otherwise,
>
> we
>
> need
>
> to
>
> prepare
>
> a SQL file in an image for Kubernetes application mode,
>
> which
>
> may
>
> be
>
> a
>
> bit
>
> cumbersome.
>
>
> Do you mean a pass the SQL string a configuration or a
>
> program
>
> argument?
>
>
> I thought it might be convenient for testing propose, but
>
> not
>
> recommended
>
> for production,
> cause Flink SQLs could be complicated and involves lots of
>
> characters
>
> that
>
> need to escape.
>
> WDYT?
>
> - I noticed that we don't specify the SQLDriver jar in the
>
> "run-application"
>
> command. Does that mean we need to perform automatic
>
> detection
>
> in
>
> Flink?
>
>
> Yes! It’s like running a PyFlink job with the following
>
> command:
>
>
> ```
> ./bin/flink run \
> --pyModule table.word_count \
> --pyFiles examples/python/table
> ```
>
> The CLI determines if it’s a SQL job, if yes apply the SQL
>
> Driver
>
> automatically.
>
>
> [1]
>
>
>
>
>
>
>
>
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>
> <
>
>
>
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>
>
> <
>
>
>
>
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>
> <
>
>
>
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
>
>
>
>
> Best,
> Paul Lam
>
> 2023年5月30日 21:56，Weihua Hu <huweihua....@gmail.com
>
> <mailto:
>
> huweihua....@gmail.com> <mailto:
>
> huweihua....@gmail.com>> 写道：
>
>
> Thanks Paul for the proposal.
>
> +1 for this. It is valuable in improving ease of use.
>
> I have a few questions.
> - Is SQLRunner the better name? We use this to run a SQL
>
> Job.
>
> (Not
>
> strong,
>
> the SQLDriver is fine for me)
> - Could we run SQL jobs using SQL in strings? Otherwise,
>
> we
>
> need
>
> to
>
> prepare
>
> a SQL file in an image for Kubernetes application mode,
>
> which
>
> may
>
> be
>
> a
>
> bit
>
> cumbersome.
> - I noticed that we don't specify the SQLDriver jar in the
>
> "run-application"
>
> command. Does that mean we need to perform automatic
>
> detection
>
> in
>
> Flink?
>
>
>
> Best,
> Weihua
>
>
> On Mon, May 29, 2023 at 7:24 PM Paul Lam <
>
> paullin3...@gmail.com
>
> <mailto:paullin3...@gmail.com <mailto:paullin3...@gmail.com>>>
>
> wrote:
>
>
> Hi team,
>
> I’d like to start a discussion about FLIP-316 [1], which
>
> introduces
>
> a
>
> SQL
>
> driver as the
> default main class for Flink SQL jobs.
>
> Currently, Flink SQL could be executed out of the box
>
> either
>
> via
>
> SQL
>
> Client/Gateway
> or embedded in a Flink Java/Python program.
>
> However, each one has its drawback:
>
> - SQL Client/Gateway doesn’t support the application
>
> deployment
>
> mode
>
> [2]
>
> - Flink Java/Python program requires extra work to write
>
> a
>
> non-SQL
>
> program
>
>
> Therefore, I propose adding a SQL driver to act as the
>
> default
>
> main
>
> class
>
> for SQL jobs.
> Please see the FLIP docs for details and feel free to
>
> comment.
>
> Thanks!
>
>
> [1]
>
>
>
>
>
>
>
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver
>
> <
>
>
>
>
>
>
>
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
>
>
> [2] https://issues.apache.org/jira/browse/FLINK-26541 <
> https://issues.apache.org/jira/browse/FLINK-26541>
>
> Best,
> Paul Lam
>
>
>
>
>
>
>

Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Reply via email to