Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

yuxia Thu, 01 Jun 2023 02:10:55 -0700

Hi, Benchao.
Thanks for your attention.

Initially, I also want to pass `TableEnvironment` to procedure. But according 
my investegation and offline discussion with Jingson, the real important thing 
for procedure devs is the ability to build Flink datastream. But we can't get 
the `StreamExecutionEnvironment` which is the entrypoint to build datastream. 
That's to say we will lost the ability to build a datastream if we just pass 
`TableEnvironment`.


Of course, we can also pass `TableEnvironment` along with 
`StreamExecutionEnvironment` to Procedure. But I'm intend to be cautious about 
exposing too much too early to procedure devs. If someday we find we will need 
`TableEnvironment` to custom a procedure, we can then add a method like 
`getTableEnvironment()` in `ProcedureContext`.

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Benchao Li" <libenc...@apache.org>
收件人: "dev" <dev@flink.apache.org>
发送时间: 星期四, 2023年 6 月 01日 下午 12:58:08
主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

Thanks Yuxia for opening this discussion,

The general idea looks good to me, I only have one question about the
`ProcedureContext#getExecutionEnvironment`. Why are you proposing to return
a `StreamExecutionEnvironment` instead of `TableEnvironment`, could you
elaborate a little more on this?

Jingsong Li <jingsongl...@gmail.com> 于2023年5月30日周二 17:58写道：

> Thanks for your explanation.
>
> We can support Iterable in future. Current design looks good to me.
>
> Best,
> Jingsong
>
> On Tue, May 30, 2023 at 4:56 PM yuxia <luoyu...@alumni.sjtu.edu.cn> wrote:
> >
> > Hi, Jingsong.
> > Thanks for your feedback.
> >
> > > Does this need to be a function call? Do you have some example?
> > I think it'll be useful to support function call when user call
> procedure.
> > The following example is from iceberg:[1]
> > CALL catalog_name.system.migrate('spark_catalog.db.sample', map('foo',
> 'bar'));
> >
> > It allows user to use `map('foo', 'bar')` to pass a map data to
> procedure.
> >
> > Another case that I can imagine may be rollback a table to the snapshot
> of one week ago.
> > Then, with function call, user may call `rollback(table_name, now() -
> INTERVAL '7' DAY)` to acheive such purpose.
> >
> > Although it can be function call, the eventual parameter got by the
> procedure will always be the literal evaluated.
> >
> >
> > > Procedure looks like a TableFunction, do you consider using Collector
> > something like TableFunction? (Supports large amount of data)
> >
> > Yes, I had considered it. But returns T[] is for simpility,
> >
> > First, regarding how to return the calling result of a procedure, it
> looks more intuitive to me to use the return result of the `call` method
> instead of by calling something like collector#collect.
> > Introduce a collector will increase necessary complexity.
> >
> > Second, regarding supporting large amount of data,  acoording my
> investagtion, I haven't seen the requirement that supports returning large
> amount of data.
> > Iceberg also return an array.[2] If you do think we should support large
> amount of data, I think we can change to return type from T[] to Iterable<T>
> >
> > [1]: https://iceberg.apache.org/docs/latest/spark-procedures/#migrate
> > [2]:
> https://github.com/apache/iceberg/blob/601c5af9b6abded79dabeba177331310d5487f43/spark/v3.2/spark/src/main/java/org/apache/spark/sql/connector/iceberg/catalog/Procedure.java#L44
> >
> > Best regards,
> > Yuxia
> >
> > ----- 原始邮件 -----
> > 发件人: "Jingsong Li" <jingsongl...@gmail.com>
> > 收件人: "dev" <dev@flink.apache.org>
> > 发送时间: 星期一, 2023年 5 月 29日 下午 2:42:04
> > 主题: Re: [DISCUSS] FLIP-311: Support Call Stored Procedure
> >
> > Thanks Yuxia for the proposal.
> >
> > > CALL [catalog_name.][database_name.]procedure_name ([ expression [,
> expression]* ] )
> >
> > The expression can be a function call. Does this need to be a function
> > call? Do you have some example?
> >
> > > Procedure returns T[]
> >
> > Procedure looks like a TableFunction, do you consider using Collector
> > something like TableFunction? (Supports large amount of data)
> >
> > Best,
> > Jingsong
> >
> > On Mon, May 29, 2023 at 2:33 PM yuxia <luoyu...@alumni.sjtu.edu.cn>
> wrote:
> > >
> > > Hi, everyone.
> > >
> > > I’d like to start a discussion about FLIP-311: Support Call Stored
> Procedure [1]
> > >
> > > Stored procedure provides a convenient way to encapsulate complex
> logic to perform data manipulation or administrative tasks in external
> storage systems. It's widely used in traditional databases and popular
> compute engines like Trino for it's convenience. Therefore, we propose
> adding support for call stored procedure in Flink to enable better
> integration with external storage systems.
> > >
> > > With this FLIP, Flink will allow connector developers to develop their
> own built-in stored procedures, and then enables users to call these
> predefiend stored procedures.
> > >
> > > Looking forward to your feedbacks.
> > >
> > > [1]:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-311%3A+Support+Call+Stored+Procedure
> > >
> > > Best regards,
> > > Yuxia
>


-- 

Best,
Benchao Li

Re: [DISCUSS] FLIP-311: Support Call Stored Procedure

Reply via email to