Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Mason Chen Wed, 07 Jun 2023 21:16:22 -0700

Hi Paul,

Thanks for your response!


I agree that utilizing SQL Drivers in Java applications is equally important
> as employing them in SQL Gateway. WRT init containers, I think most
> users use them just as a workaround. For example, wget a jar from the
> maven repo.
>
> We could implement the functionality in SQL Driver in a more graceful
> way and the flink-supported filesystem approach seems to be a
> good choice.
>

My main point is: can we solve the problem with a design agnostic of SQL
and Stream API? I mentioned a use case where this ability is useful for
Java or Stream API applications. Maybe this is even a non-goal to your FLIP
since you are focusing on the driver entrypoint.

Jark mentioned some optimizations:

> This allows SQLGateway to leverage some metadata caching and UDF JAR
> caching for better compiling performance.
>
It would be great to see this even outside the SQLGateway (i.e. UDF JAR
caching).

Best,
Mason

On Wed, Jun 7, 2023 at 2:26 AM Shengkai Fang <[email protected]> wrote:

> Hi. Paul.  Thanks for your update and the update makes me understand the
> design much better.
>
> But I still have some questions about the FLIP.
>
> > For SQL Gateway, only DMLs need to be delegated to the SQL server
> > Driver. I would think about the details and update the FLIP. Do you have
> some
> > ideas already?
>
> If the applicaiton mode can not support library mode, I think we should
> only execute INSERT INTO and UPDATE/ DELETE statement in the application
> mode. AFAIK, we can not support ANALYZE TABLE and CALL PROCEDURE
> statements. The ANALYZE TABLE syntax need to register the statistic to the
> catalog after job finishes and the CALL PROCEDURE statement doesn't
> generate the ExecNodeGraph.
>
> * Introduce storage via option `sql-gateway.application.storage-dir`
>
> If we can not support to submit the jars through web submission, +1 to
> introduce the options to upload the files. While I think the uploader
> should be responsible to remove the uploaded jars. Can we remove the jars
> if the job is running or gateway exits?
>
> * JobID is not avaliable
>
> Can we use the returned rest client by ApplicationDeployer to query the job
> id? I am concerned that users don't know which job is related to the
> submitted SQL.
>
> * Do we need to introduce a new module named flink-table-sql-runner?
>
> It seems we need to introduce a new module. Will the new module is
> available in the distribution package? I agree with Jark that we don't need
> to introduce this for table-API users and these users have their main
> class. If we want to make users write the k8s operator more easily, I think
> we should modify the k8s operator repo. If we don't need to support SQL
> files, can we make this jar only visible in the sql-gateway like we do in
> the planner loader?[1]
>
> [1]
>
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-loader/src/main/java/org/apache/flink/table/planner/loader/PlannerModule.java#L95
>
> Best,
> Shengkai
>
>
>
>
>
>
>
>
> Weihua Hu <[email protected]> 于2023年6月7日周三 10:52写道：
>
> > Hi,
> >
> > Thanks for updating the FLIP.
> >
> > I have two cents on the distribution of SQLs and resources.
> > 1. Should we support a common file distribution mechanism for k8s
> > application mode?
> >   I have seen some issues and requirements on the mailing list.
> >   In our production environment, we implement the download command in the
> > CliFrontend.
> >   And automatically add an init container to the POD for file
> downloading.
> > The advantage of this
> >   is that we can use all Flink-supported file systems to store files.
> >
> >   This need more discussion. I would appreciate hearing more opinions.
> >
> > 2. In this FLIP, we distribute files in two different ways in YARN and
> > Kubernetes. Can we combine it in one way?
> >   If we don't want to implement a common file distribution for k8s
> > application mode. Could we use the SQLDriver
> >   to download the files both in YARN and K8S? IMO, this can reduce the
> cost
> > of code maintenance.
> >
> > Best,
> > Weihua
> >
> >
> > On Wed, Jun 7, 2023 at 10:18 AM Paul Lam <[email protected]> wrote:
> >
> > > Hi Mason,
> > >
> > > Thanks for your input!
> > >
> > > > +1 for init containers or a more generalized way of obtaining
> arbitrary
> > > > files. File fetching isn't specific to just SQL--it also matters for
> > Java
> > > > applications if the user doesn't want to rebuild a Flink image and
> just
> > > > wants to modify the user application fat jar.
> > >
> > > I agree that utilizing SQL Drivers in Java applications is equally
> > > important
> > > as employing them in SQL Gateway. WRT init containers, I think most
> > > users use them just as a workaround. For example, wget a jar from the
> > > maven repo.
> > >
> > > We could implement the functionality in SQL Driver in a more graceful
> > > way and the flink-supported filesystem approach seems to be a
> > > good choice.
> > >
> > > > Also, what do you think about prefixing the config options with
> > > > `sql-driver` instead of just `sql` to be more specific?
> > >
> > > LGTM, since SQL Driver is a public interface and the options are
> > > specific to it.
> > >
> > > Best,
> > > Paul Lam
> > >
> > > > 2023年6月6日 06:30，Mason Chen <[email protected]> 写道：
> > > >
> > > > Hi Paul,
> > > >
> > > > +1 for this feature and supporting SQL file + JSON plans. We get a
> lot
> > of
> > > > requests to just be able to submit a SQL file, but the JSON plan
> > > > optimizations make sense.
> > > >
> > > > +1 for init containers or a more generalized way of obtaining
> arbitrary
> > > > files. File fetching isn't specific to just SQL--it also matters for
> > Java
> > > > applications if the user doesn't want to rebuild a Flink image and
> just
> > > > wants to modify the user application fat jar.
> > > >
> > > > Please note that we could reuse the checkpoint storage like S3/HDFS,
> > > which
> > > >> should
> > > >
> > > > be required to run Flink in production, so I guess that would be
> > > acceptable
> > > >> for most
> > > >
> > > > users. WDYT?
> > > >
> > > >
> > > > If you do go this route, it would be nice to support writing these
> > files
> > > to
> > > > S3/HDFS via Flink. This makes access control and policy management
> > > simpler.
> > > >
> > > > Also, what do you think about prefixing the config options with
> > > > `sql-driver` instead of just `sql` to be more specific?
> > > >
> > > > Best,
> > > > Mason
> > > >
> > > > On Mon, Jun 5, 2023 at 2:28 AM Paul Lam <[email protected]
> > <mailto:
> > > [email protected]>> wrote:
> > > >
> > > >> Hi Jark,
> > > >>
> > > >> Thanks for your input! Please see my comments inline.
> > > >>
> > > >>> Isn't Table API the same way as DataSream jobs to submit Flink SQL?
> > > >>> DataStream API also doesn't provide a default main class for users,
> > > >>> why do we need to provide such one for SQL?
> > > >>
> > > >> Sorry for the confusion I caused. By DataStream jobs, I mean jobs
> > > submitted
> > > >> via Flink CLI which actually could be DataStream/Table jobs.
> > > >>
> > > >> I think a default main class would be user-friendly which eliminates
> > the
> > > >> need
> > > >> for users to write a main class as SQLRunner in Flink K8s operator
> > [1].
> > > >>
> > > >>> I thought the proposed SqlDriver was a dedicated main class
> accepting
> > > >> SQL files, is
> > > >>> that correct?
> > > >>
> > > >> Both JSON plans and SQL files are accepted. SQL Gateway should use
> > JSON
> > > >> plans,
> > > >> while CLI users may use either JSON plans or SQL files.
> > > >>
> > > >> Please see the updated FLIP[2] for more details.
> > > >>
> > > >>> Personally, I prefer the way of init containers which doesn't
> depend
> > on
> > > >>> additional components.
> > > >>> This can reduce the moving parts of a production environment.
> > > >>> Depending on a distributed file system makes the testing, demo, and
> > > local
> > > >>> setup harder than init containers.
> > > >>
> > > >> Please note that we could reuse the checkpoint storage like S3/HDFS,
> > > which
> > > >> should
> > > >> be required to run Flink in production, so I guess that would be
> > > >> acceptable for most
> > > >> users. WDYT?
> > > >>
> > > >> WRT testing, demo, and local setups, I think we could support the
> > local
> > > >> filesystem
> > > >> scheme i.e. file://** as the state backends do. It works as long as
> > SQL
> > > >> Gateway
> > > >> and JobManager(or SQL Driver) can access the resource directory
> > > (specified
> > > >> via
> > > >> `sql-gateway.application.storage-dir`).
> > > >>
> > > >> Thanks!
> > > >>
> > > >> [1]
> > > >>
> > >
> >
> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/flink-sql-runner-example/src/main/java/org/apache/flink/examples/SqlRunner.java
> > > >> [2]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
> > > >> [3]
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/3245e0443b2a4663552a5b707c5c8c46876c1f6d/flink-runtime/src/test/java/org/apache/flink/runtime/state/filesystem/AbstractFileCheckpointStorageAccessTestBase.java#L161
> > > >>
> > > >> Best,
> > > >> Paul Lam
> > > >>
> > > >>> 2023年6月3日 12:21，Jark Wu <[email protected]> 写道：
> > > >>>
> > > >>> Hi Paul,
> > > >>>
> > > >>> Thanks for your reply. I left my comments inline.
> > > >>>
> > > >>>> As the FLIP said, it’s good to have a default main class for Flink
> > > SQLs,
> > > >>>> which allows users to submit Flink SQLs in the same way as
> > DataStream
> > > >>>> jobs, or else users need to write their own main class.
> > > >>>
> > > >>> Isn't Table API the same way as DataSream jobs to submit Flink SQL?
> > > >>> DataStream API also doesn't provide a default main class for users,
> > > >>> why do we need to provide such one for SQL?
> > > >>>
> > > >>>> With the help of ExecNodeGraph, do we still need the serialized
> > > >>>> SessionState? If not, we could make SQL Driver accepts two
> > serialized
> > > >>>> formats:
> > > >>>
> > > >>> No, ExecNodeGraph doesn't need to serialize SessionState. I thought
> > the
> > > >>> proposed SqlDriver was a dedicated main class accepting SQL files,
> is
> > > >>> that correct?
> > > >>> If true, we have to ship the SessionState for this case which is a
> > > large
> > > >>> work.
> > > >>> I think we just need a JsonPlanDriver which is a main class that
> > > accepts
> > > >>> JsonPlan as the parameter.
> > > >>>
> > > >>>
> > > >>>> The common solutions I know is to use distributed file systems or
> > use
> > > >>>> init containers to localize the resources.
> > > >>>
> > > >>> Personally, I prefer the way of init containers which doesn't
> depend
> > on
> > > >>> additional components.
> > > >>> This can reduce the moving parts of a production environment.
> > > >>> Depending on a distributed file system makes the testing, demo, and
> > > local
> > > >>> setup harder than init containers.
> > > >>>
> > > >>> Best,
> > > >>> Jark
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Fri, 2 Jun 2023 at 18:10, Paul Lam <[email protected]
> > <mailto:
> > > [email protected]> <mailto:
> > > >> [email protected] <mailto:[email protected]>>> wrote:
> > > >>>
> > > >>>> The FLIP is in the early phase and some details are not included,
> > but
> > > >>>> fortunately, we got lots of valuable ideas from the discussion.
> > > >>>>
> > > >>>> Thanks to everyone who joined the dissuasion!
> > > >>>> @Weihua @Shanmon @Shengkai @Biao @Jark
> > > >>>>
> > > >>>> This weekend I’m gonna revisit and update the FLIP, adding more
> > > >>>> details. Hopefully, we can further align our opinions.
> > > >>>>
> > > >>>> Best,
> > > >>>> Paul Lam
> > > >>>>
> > > >>>>> 2023年6月2日 18:02，Paul Lam <[email protected] <mailto:
> > > [email protected]>> 写道：
> > > >>>>>
> > > >>>>> Hi Jark,
> > > >>>>>
> > > >>>>> Thanks a lot for your input!
> > > >>>>>
> > > >>>>>> If we decide to submit ExecNodeGraph instead of SQL file, is it
> > > still
> > > >>>>>> necessary to support SQL Driver?
> > > >>>>>
> > > >>>>> I think so. Apart from usage in SQL Gateway, SQL Driver could
> > > simplify
> > > >>>>> Flink SQL execution with Flink CLI.
> > > >>>>>
> > > >>>>> As the FLIP said, it’s good to have a default main class for
> Flink
> > > >> SQLs,
> > > >>>>> which allows users to submit Flink SQLs in the same way as
> > DataStream
> > > >>>>> jobs, or else users need to write their own main class.
> > > >>>>>
> > > >>>>>> SQL Driver needs to serialize SessionState which is very
> > challenging
> > > >>>>>> but not detailed covered in the FLIP.
> > > >>>>>
> > > >>>>> With the help of ExecNodeGraph, do we still need the serialized
> > > >>>>> SessionState? If not, we could make SQL Driver accepts two
> > serialized
> > > >>>>> formats:
> > > >>>>>
> > > >>>>> - SQL files for user-facing public usage
> > > >>>>> - ExecNodeGraph for internal usage
> > > >>>>>
> > > >>>>> It’s kind of similar to the relationship between job jars and
> > > >> jobgraphs.
> > > >>>>>
> > > >>>>>> Regarding "K8S doesn't support shipping multiple jars", is that
> > > true?
> > > >>>> Is it
> > > >>>>>> possible to support it?
> > > >>>>>
> > > >>>>> Yes, K8s doesn’t distribute any files. It’s the users’
> > responsibility
> > > >> to
> > > >>>> make
> > > >>>>> sure the resources are accessible in the containers. The common
> > > >> solutions
> > > >>>>> I know is to use distributed file systems or use init containers
> to
> > > >>>> localize the
> > > >>>>> resources.
> > > >>>>>
> > > >>>>> Now I lean toward introducing a fs to do the distribution job.
> > WDYT?
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Paul Lam
> > > >>>>>
> > > >>>>>> 2023年6月1日 20:33，Jark Wu <[email protected] <mailto:
> > [email protected]>
> > > <mailto:[email protected] <mailto:[email protected]>>
> > > >> <mailto:[email protected] <mailto:[email protected]> <mailto:
> > > [email protected] <mailto:[email protected]>>>>
> > > >>>> 写道：
> > > >>>>>>
> > > >>>>>> Hi Paul,
> > > >>>>>>
> > > >>>>>> Thanks for starting this discussion. I like the proposal! This
> is
> > a
> > > >>>>>> frequently requested feature!
> > > >>>>>>
> > > >>>>>> I agree with Shengkai that ExecNodeGraph as the submission
> object
> > > is a
> > > >>>>>> better idea than SQL file. To be more specific, it should be
> > > >>>> JsonPlanGraph
> > > >>>>>> or CompiledPlan which is the serializable representation.
> > > CompiledPlan
> > > >>>> is a
> > > >>>>>> clear separation between compiling/optimization/validation and
> > > >>>> execution.
> > > >>>>>> This can keep the validation and metadata accessing still on the
> > > >>>> SQLGateway
> > > >>>>>> side. This allows SQLGateway to leverage some metadata caching
> and
> > > UDF
> > > >>>> JAR
> > > >>>>>> caching for better compiling performance.
> > > >>>>>>
> > > >>>>>> If we decide to submit ExecNodeGraph instead of SQL file, is it
> > > still
> > > >>>>>> necessary to support SQL Driver? Regarding non-interactive SQL
> > jobs,
> > > >>>> users
> > > >>>>>> can use the Table API program for application mode. SQL Driver
> > needs
> > > >> to
> > > >>>>>> serialize SessionState which is very challenging but not
> detailed
> > > >>>> covered
> > > >>>>>> in the FLIP.
> > > >>>>>>
> > > >>>>>> Regarding "K8S doesn't support shipping multiple jars", is that
> > > true?
> > > >>>> Is it
> > > >>>>>> possible to support it?
> > > >>>>>>
> > > >>>>>> Best,
> > > >>>>>> Jark
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Thu, 1 Jun 2023 at 16:58, Paul Lam <[email protected]
> > > <mailto:[email protected]> <mailto:
> > > >> [email protected] <mailto:[email protected]>> <mailto:
> > > >>>> [email protected] <mailto:[email protected]> <mailto:
> > > [email protected] <mailto:[email protected]>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hi Weihua,
> > > >>>>>>>
> > > >>>>>>> You’re right. Distributing the SQLs to the TMs is one of the
> > > >>>> challenging
> > > >>>>>>> parts of this FLIP.
> > > >>>>>>>
> > > >>>>>>> Web submission is not enabled in application mode currently as
> > you
> > > >>>> said,
> > > >>>>>>> but it could be changed if we have good reasons.
> > > >>>>>>>
> > > >>>>>>> What do you think about introducing a distributed storage for
> SQL
> > > >>>> Gateway?
> > > >>>>>>>
> > > >>>>>>> We could make use of Flink file systems [1] to distribute the
> SQL
> > > >>>> Gateway
> > > >>>>>>> generated resources, that should solve the problem at its root
> > > cause.
> > > >>>>>>>
> > > >>>>>>> Users could specify Flink-supported file systems to ship files.
> > > It’s
> > > >>>> only
> > > >>>>>>> required when using SQL Gateway with K8s application mode.
> > > >>>>>>>
> > > >>>>>>> [1]
> > > >>>>>>>
> > > >>>>
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> > > <
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> > > >
> > > >> <
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> > > <
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> > > >
> > > >>>
> > > >>>> <
> > > >>>>
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> > > <
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> > > >
> > > >> <
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> > > <
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
> > > >
> > > >>>
> > > >>>>>
> > > >>>>>>>
> > > >>>>>>> Best,
> > > >>>>>>> Paul Lam
> > > >>>>>>>
> > > >>>>>>>> 2023年6月1日 13:55，Weihua Hu <[email protected] <mailto:
> > > [email protected]> <mailto:
> > > >> [email protected] <mailto:[email protected]>>> 写道：
> > > >>>>>>>>
> > > >>>>>>>> Thanks Paul for your reply.
> > > >>>>>>>>
> > > >>>>>>>> SQLDriver looks good to me.
> > > >>>>>>>>
> > > >>>>>>>> 2. Do you mean a pass the SQL string a configuration or a
> > program
> > > >>>>>>> argument?
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> I brought this up because we were unable to pass the SQL file
> to
> > > >> Flink
> > > >>>>>>>> using Kubernetes mode.
> > > >>>>>>>> For DataStream/Python users, they need to prepare their images
> > for
> > > >> the
> > > >>>>>>> jars
> > > >>>>>>>> and dependencies.
> > > >>>>>>>> But for SQL users, they can use a common image to run
> different
> > > SQL
> > > >>>>>>> queries
> > > >>>>>>>> if there are no other udf requirements.
> > > >>>>>>>> It would be great if the SQL query and image were not bound.
> > > >>>>>>>>
> > > >>>>>>>> Using strings is a way to decouple these, but just as you
> > > mentioned,
> > > >>>> it's
> > > >>>>>>>> not easy to pass complex SQL.
> > > >>>>>>>>
> > > >>>>>>>>> use web submission
> > > >>>>>>>> AFAIK, we can not use web submission in the Application mode.
> > > Please
> > > >>>>>>>> correct me if I'm wrong.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Weihua
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Wed, May 31, 2023 at 9:37 PM Paul Lam <
> [email protected]
> > > <mailto:[email protected]>
> > > >> <mailto:[email protected] <mailto:[email protected]>>>
> > > >>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi Biao,
> > > >>>>>>>>>
> > > >>>>>>>>> Thanks for your comments!
> > > >>>>>>>>>
> > > >>>>>>>>>> 1. Scope: is this FLIP only targeted for non-interactive
> Flink
> > > SQL
> > > >>>> jobs
> > > >>>>>>>>> in
> > > >>>>>>>>>> Application mode? More specifically, if we use SQL
> > > client/gateway
> > > >> to
> > > >>>>>>>>>> execute some interactive SQLs like a SELECT query, can we
> ask
> > > >> flink
> > > >>>> to
> > > >>>>>>>>> use
> > > >>>>>>>>>> Application mode to execute those queries after this FLIP?
> > > >>>>>>>>>
> > > >>>>>>>>> Thanks for pointing it out. I think only DMLs would be
> executed
> > > via
> > > >>>> SQL
> > > >>>>>>>>> Driver.
> > > >>>>>>>>> I'll add the scope to the FLIP.
> > > >>>>>>>>>
> > > >>>>>>>>>> 2. Deployment: I believe in YARN mode, the implementation is
> > > >>>> trivial as
> > > >>>>>>>>> we
> > > >>>>>>>>>> can ship files via YARN's tool easily but for K8s, things
> can
> > be
> > > >>>> more
> > > >>>>>>>>>> complicated as Shengkai said.
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Your input is very informative. I’m thinking about using web
> > > >>>> submission,
> > > >>>>>>>>> but it requires exposing the JobManager port which could also
> > be
> > > a
> > > >>>>>>> problem
> > > >>>>>>>>> on K8s.
> > > >>>>>>>>>
> > > >>>>>>>>> Another approach is to explicitly require a distributed
> storage
> > > to
> > > >>>> ship
> > > >>>>>>>>> files,
> > > >>>>>>>>> but we may need a new deployment executor for that.
> > > >>>>>>>>>
> > > >>>>>>>>> What do you think of these two approaches?
> > > >>>>>>>>>
> > > >>>>>>>>>> 3. Serialization of SessionState: in SessionState, there are
> > > some
> > > >>>>>>>>>> unserializable fields
> > > >>>>>>>>>> like
> > > >>>> org.apache.flink.table.resource.ResourceManager#userClassLoader.
> > > >>>>>>> It
> > > >>>>>>>>>> may be worthwhile to add more details about the
> serialization
> > > >> part.
> > > >>>>>>>>>
> > > >>>>>>>>> I agree. That’s a missing part. But if we use ExecNodeGraph
> as
> > > >>>> Shengkai
> > > >>>>>>>>> mentioned, do we eliminate the need for serialization of
> > > >>>> SessionState?
> > > >>>>>>>>>
> > > >>>>>>>>> Best,
> > > >>>>>>>>> Paul Lam
> > > >>>>>>>>>
> > > >>>>>>>>>> 2023年5月31日 13:07，Biao Geng <[email protected] <mailto:
> > > [email protected]> <mailto:
> > > >> [email protected] <mailto:[email protected]>>> 写道：
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thanks Paul for the proposal!I believe it would be very
> useful
> > > for
> > > >>>>>>> flink
> > > >>>>>>>>>> users.
> > > >>>>>>>>>> After reading the FLIP, I have some questions:
> > > >>>>>>>>>> 1. Scope: is this FLIP only targeted for non-interactive
> Flink
> > > SQL
> > > >>>> jobs
> > > >>>>>>>>> in
> > > >>>>>>>>>> Application mode? More specifically, if we use SQL
> > > client/gateway
> > > >> to
> > > >>>>>>>>>> execute some interactive SQLs like a SELECT query, can we
> ask
> > > >> flink
> > > >>>> to
> > > >>>>>>>>> use
> > > >>>>>>>>>> Application mode to execute those queries after this FLIP?
> > > >>>>>>>>>> 2. Deployment: I believe in YARN mode, the implementation is
> > > >>>> trivial as
> > > >>>>>>>>> we
> > > >>>>>>>>>> can ship files via YARN's tool easily but for K8s, things
> can
> > be
> > > >>>> more
> > > >>>>>>>>>> complicated as Shengkai said. I have implemented a simple
> POC
> > > >>>>>>>>>> <
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>
> > >
> >
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
> > > <
> > >
> >
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
> > > >
> > > >> <
> > > >>
> > >
> >
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
> > > <
> > >
> >
> https://github.com/bgeng777/flink/commit/5b4338fe52ec343326927f0fc12f015dd22b1133
> > > >
> > > >>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> based on SQL client before(i.e. consider the SQL client
> which
> > > >>>> supports
> > > >>>>>>>>>> executing a SQL file as the SQL driver in this FLIP). One
> > > problem
> > > >> I
> > > >>>>>>> have
> > > >>>>>>>>>> met is how do we ship SQL files ( or Job Graph) to the k8s
> > side.
> > > >>>>>>> Without
> > > >>>>>>>>>> such support, users have to modify the initContainer or
> > rebuild
> > > a
> > > >>>> new
> > > >>>>>>> K8s
> > > >>>>>>>>>> image every time to fetch the SQL file. Like the flink k8s
> > > >> operator,
> > > >>>>>>> one
> > > >>>>>>>>>> workaround is to utilize the flink config(transforming the
> SQL
> > > >> file
> > > >>>> to
> > > >>>>>>> a
> > > >>>>>>>>>> escaped string like Weihua mentioned) which will be
> converted
> > > to a
> > > >>>>>>>>>> ConfigMap but K8s has size limit of ConfigMaps(no larger
> than
> > > 1MB
> > > >>>>>>>>>> <
> https://kubernetes.io/docs/concepts/configuration/configmap/
> > <
> > > https://kubernetes.io/docs/concepts/configuration/configmap/> <
> > > >> https://kubernetes.io/docs/concepts/configuration/configmap/ <
> > > https://kubernetes.io/docs/concepts/configuration/configmap/>>>).
> > > >>>> Not
> > > >>>>>>>>> sure
> > > >>>>>>>>>> if we have better solutions.
> > > >>>>>>>>>> 3. Serialization of SessionState: in SessionState, there are
> > > some
> > > >>>>>>>>>> unserializable fields
> > > >>>>>>>>>> like
> > > >>>> org.apache.flink.table.resource.ResourceManager#userClassLoader.
> > > >>>>>>> It
> > > >>>>>>>>>> may be worthwhile to add more details about the
> serialization
> > > >> part.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Biao Geng
> > > >>>>>>>>>>
> > > >>>>>>>>>> Paul Lam <[email protected] <mailto:
> [email protected]
> > >
> > > <mailto:[email protected] <mailto:[email protected]>>>
> > > >> 于2023年5月31日周三 11:49写道：
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi Weihua,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Thanks a lot for your input! Please see my comments inline.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> - Is SQLRunner the better name? We use this to run a SQL
> > Job.
> > > >> (Not
> > > >>>>>>>>>>> strong,
> > > >>>>>>>>>>>> the SQLDriver is fine for me)
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I’ve thought about SQL Runner but picked SQL Driver for the
> > > >>>> following
> > > >>>>>>>>>>> reasons FYI:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 1. I have a PythonDriver doing the same job for PyFlink [1]
> > > >>>>>>>>>>> 2. Flink program's main class is sort of like Driver in
> JDBC
> > > >> which
> > > >>>>>>>>>>> translates SQLs into
> > > >>>>>>>>>>> databases specific languages.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> In general, I’m +1 for SQL Driver and +0 for SQL Runner.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> - Could we run SQL jobs using SQL in strings? Otherwise,
> we
> > > need
> > > >>>> to
> > > >>>>>>>>>>> prepare
> > > >>>>>>>>>>>> a SQL file in an image for Kubernetes application mode,
> > which
> > > >> may
> > > >>>> be
> > > >>>>>>> a
> > > >>>>>>>>>>> bit
> > > >>>>>>>>>>>> cumbersome.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Do you mean a pass the SQL string a configuration or a
> > program
> > > >>>>>>> argument?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I thought it might be convenient for testing propose, but
> not
> > > >>>>>>>>> recommended
> > > >>>>>>>>>>> for production,
> > > >>>>>>>>>>> cause Flink SQLs could be complicated and involves lots of
> > > >>>> characters
> > > >>>>>>>>> that
> > > >>>>>>>>>>> need to escape.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> WDYT?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> - I noticed that we don't specify the SQLDriver jar in the
> > > >>>>>>>>>>> "run-application"
> > > >>>>>>>>>>>> command. Does that mean we need to perform automatic
> > detection
> > > >> in
> > > >>>>>>>>> Flink?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Yes! It’s like running a PyFlink job with the following
> > > command:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> ```
> > > >>>>>>>>>>> ./bin/flink run \
> > > >>>>>>>>>>>  --pyModule table.word_count \
> > > >>>>>>>>>>>  --pyFiles examples/python/table
> > > >>>>>>>>>>> ```
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> The CLI determines if it’s a SQL job, if yes apply the SQL
> > > Driver
> > > >>>>>>>>>>> automatically.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> [1]
> > > >>>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
> > > <
> > >
> >
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
> > > >
> > > >> <
> > > >>
> > >
> >
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
> > > <
> > >
> >
> https://github.com/apache/flink/blob/master/flink-python/src/main/java/org/apache/flink/client/python/PythonDriver.java
> > > >
> > > >>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Best,
> > > >>>>>>>>>>> Paul Lam
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> 2023年5月30日 21:56，Weihua Hu <[email protected]
> <mailto:
> > > [email protected]> <mailto:
> > > >> [email protected]>> 写道：
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Thanks Paul for the proposal.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> +1 for this. It is valuable in improving ease of use.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I have a few questions.
> > > >>>>>>>>>>>> - Is SQLRunner the better name? We use this to run a SQL
> > Job.
> > > >> (Not
> > > >>>>>>>>>>> strong,
> > > >>>>>>>>>>>> the SQLDriver is fine for me)
> > > >>>>>>>>>>>> - Could we run SQL jobs using SQL in strings? Otherwise,
> we
> > > need
> > > >>>> to
> > > >>>>>>>>>>> prepare
> > > >>>>>>>>>>>> a SQL file in an image for Kubernetes application mode,
> > which
> > > >> may
> > > >>>> be
> > > >>>>>>> a
> > > >>>>>>>>>>> bit
> > > >>>>>>>>>>>> cumbersome.
> > > >>>>>>>>>>>> - I noticed that we don't specify the SQLDriver jar in the
> > > >>>>>>>>>>> "run-application"
> > > >>>>>>>>>>>> command. Does that mean we need to perform automatic
> > detection
> > > >> in
> > > >>>>>>>>> Flink?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Best,
> > > >>>>>>>>>>>> Weihua
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Mon, May 29, 2023 at 7:24 PM Paul Lam <
> > > [email protected]
> > > >> <mailto:[email protected] <mailto:[email protected]>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Hi team,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I’d like to start a discussion about FLIP-316 [1], which
> > > >>>> introduces
> > > >>>>>>> a
> > > >>>>>>>>>>> SQL
> > > >>>>>>>>>>>>> driver as the
> > > >>>>>>>>>>>>> default main class for Flink SQL jobs.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Currently, Flink SQL could be executed out of the box
> > either
> > > >> via
> > > >>>> SQL
> > > >>>>>>>>>>>>> Client/Gateway
> > > >>>>>>>>>>>>> or embedded in a Flink Java/Python program.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> However, each one has its drawback:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> - SQL Client/Gateway doesn’t support the application
> > > deployment
> > > >>>> mode
> > > >>>>>>>>> [2]
> > > >>>>>>>>>>>>> - Flink Java/Python program requires extra work to write
> a
> > > >>>> non-SQL
> > > >>>>>>>>>>> program
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Therefore, I propose adding a SQL driver to act as the
> > > default
> > > >>>> main
> > > >>>>>>>>>>> class
> > > >>>>>>>>>>>>> for SQL jobs.
> > > >>>>>>>>>>>>> Please see the FLIP docs for details and feel free to
> > > comment.
> > > >>>>>>> Thanks!
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> [1]
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316%3A+Introduce+SQL+Driver
> > > >>>>>>>>>>>>> <
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-316:+Introduce+SQL+Driver
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/FLINK-26541 <
> > > >>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-26541>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>> Paul Lam
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-316: Introduce SQL Driver

Reply via email to