Hi, everyone

First of all, thanks for the valuable suggestions received about this FLIP. 
After some discussion, it looks like all concerns have been addressed for now, 
so I will start a vote about this FLIP in two or three days later. Also, 
further feedback is very welcome.

Best,

Ron


> -----原始邮件-----
> 发件人: "刘大龙" <ld...@zju.edu.cn>
> 发送时间: 2022-04-08 10:09:46 (星期五)
> 收件人: dev@flink.apache.org
> 抄送: 
> 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi, Martijn
> 
> Do you have any question about this FLIP? looking forward to your more 
> feedback.
> 
> Best,
> 
> Ron
> 
> 
> > -----原始邮件-----
> > 发件人: "刘大龙" <ld...@zju.edu.cn>
> > 发送时间: 2022-03-29 19:33:58 (星期二)
> > 收件人: dev@flink.apache.org
> > 抄送: 
> > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > 
> > 
> > 
> > 
> > > -----原始邮件-----
> > > 发件人: "Martijn Visser" <mart...@ververica.com>
> > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > 收件人: dev <dev@flink.apache.org>
> > > 抄送: 
> > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > 
> > > Hi Ron,
> > > 
> > > Thanks for creating the FLIP. You're talking about both local and remote
> > > resources. With regards to remote resources, how do you see this work with
> > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > > dependencies are not packaged, but I would hope that we do that for all
> > > filesystem implementation. I don't think it's a good idea to have any 
> > > tight
> > > coupling to file system implementations, especially if at some point we
> > > could also externalize file system implementations (like we're doing for
> > > connectors already). I think the FLIP would be better by not only
> > > referring to "Hadoop" as a remote resource provider, but a more generic
> > > term since there are more options than Hadoop.
> > > 
> > > I'm also thinking about security/operations implications: would it be
> > > possible for bad actor X to create a JAR that either influences other
> > > running jobs, leaks data or credentials or anything else? If so, I think 
> > > it
> > > would also be good to have an option to disable this feature completely. I
> > > think there are roughly two types of companies who run Flink: those who
> > > open it up for everyone to use (here the feature would be welcomed) and
> > > those who need to follow certain minimum standards/have a more closed 
> > > Flink
> > > ecosystem). They usually want to validate a JAR upfront before making it
> > > available, even at the expense of speed, because it gives them more 
> > > control
> > > over what will be running in their environment.
> > > 
> > > Best regards,
> > > 
> > > Martijn Visser
> > > https://twitter.com/MartijnVisser82
> > > 
> > > 
> > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> > > 
> > > >
> > > >
> > > >
> > > > > -----原始邮件-----
> > > > > 发件人: "Peter Huang" <huangzhenqiu0...@gmail.com>
> > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > 收件人: dev <dev@flink.apache.org>
> > > > > 抄送:
> > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > >
> > > > > Hi Ron,
> > > > >
> > > > > Thanks for reviving the discussion of the work. The design looks 
> > > > > good. A
> > > > > small typo in the FLIP is that currently it is marked as released in
> > > > 1.16.
> > > > >
> > > > >
> > > > > Best Regards
> > > > > Peter Huang
> > > > >
> > > > >
> > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zhangma...@163.com> 
> > > > > wrote:
> > > > >
> > > > > > hi Yuxia,
> > > > > >
> > > > > >
> > > > > > Thanks for your reply. Your reminder is very important !
> > > > > >
> > > > > >
> > > > > > Since we download the file to the local, remember to clean it up 
> > > > > > when
> > > > the
> > > > > > flink client exits
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Best regards,
> > > > > > Mang Zhang
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > > > <luoyuxia.luoyu...@alibaba-inc.com.INVALID> wrote:
> > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will
> > > > > > benefit from it. The flip looks good to me. I just have two minor
> > > > questions:
> > > > > > >1. For synax explanation, I see it's "Create .... function as
> > > > > > identifier....", I think the word "identifier" may not be
> > > > > > self-dedescriptive for actually it's not a random name but the name 
> > > > > > of
> > > > the
> > > > > > class that provides the implementation for function to be create.
> > > > > > >May be it'll be more clear to use "class_name" replace "identifier"
> > > > just
> > > > > > like what Hive[1]/Spark[2] do.
> > > > > > >
> > > > > > >2.  >> If the resource used is a remote resource, it will first
> > > > download
> > > > > > the resource to a local temporary directory, which will be generated
> > > > using
> > > > > > UUID, and then register the local path to the user class loader.
> > > > > > >For the above explanation in this FLIP, It seems for such statement
> > > > sets,
> > > > > > >""
> > > > > > >Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
> > > > > > >Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
> > > > > > >""
> > > > > > > it'll download the resource 'hdfs://myudfs.jar' for twice. So is 
> > > > > > > it
> > > > > > possible to provide some cache mechanism that we won't need to
> > > > download /
> > > > > > store for twice?
> > > > > > >
> > > > > > >
> > > > > > >Best regards,
> > > > > > >Yuxia
> > > > > > >[1]
> > > > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > > > >[2]
> > > > > >
> > > > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > > > >发件人:Mang Zhang<zhangma...@163.com>
> > > > > > >日 期:2022年03月22日 11:35:24
> > > > > > >收件人:<dev@flink.apache.org>
> > > > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > > >
> > > > > > >Hi Ron, Thank you so much for this suggestion, this is so good.
> > > > > > >In our company, when users use custom UDF, it is very inconvenient,
> > > > and
> > > > > > the code needs to be packaged into the job jar,
> > > > > > >and cannot refer to the existing udf jar through the existing udf 
> > > > > > >jar.
> > > > > > >Or pass in the jar reference in the startup command.
> > > > > > >If we implement this feature, users can focus on their own business
> > > > > > development.
> > > > > > >I can also contribute if needed.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >--
> > > > > > >
> > > > > > >Best regards,
> > > > > > >Mang Zhang
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > > > >>Hi, everyone
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>I would like to open a discussion for support advanced Function 
> > > > > > >>DDL,
> > > > > > this proposal is a continuation of FLIP-79 in which Flink Function 
> > > > > > DDL
> > > > is
> > > > > > defined. Until now it is partially released as the Flink function 
> > > > > > DDL
> > > > with
> > > > > > user defined resources is not clearly discussed and implemented. It 
> > > > > > is
> > > > an
> > > > > > important feature for support to register UDF with custom jar 
> > > > > > resource,
> > > > > > users can use UDF more more easily without having to put jars under 
> > > > > > the
> > > > > > classpath in advance.
> > > > > > >>
> > > > > > >>Looking forward to your feedback.
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>[1]
> > > > > >
> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>Best,
> > > > > > >>
> > > > > > >>Ron
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > >
> > > >
> > > > Hi, Peter, Thanks for your feedback. This work also has your effort, 
> > > > thank
> > > > you very much.
> > > >
> > 
> > Hi, Martijn
> > Thank you very much for the feedback, it was very useful for me.
> > 1. Filesystem abstraction: With regards to remote resources, I agree with 
> > you that we should use Flink's FileSytem abstraction to supports all types 
> > of file system, including HTTP, S3, HDFS, etc, rather than binding to a 
> > specific implementation. Currently in the first version, we will give 
> > priority to support HDFS as a resource provider by Flink's FileSytem 
> > abstraction. HDFS is used very much.
> > 
> > 2. Security/operations implications: The point you are considering is a 
> > great one, security is an issue that needs to be considered. Your starting 
> > point is that Jar needs to have some verification done on it before it is 
> > used, to avoid some non-secure behavior. However, IMO, the validation of 
> > Jar is supposed to be done by the platform side itself, and the platform 
> > needs to ensure that users have permission to use the jar and security of 
> > Jar. Option is not able to disable the syntax completely, the user can 
> > still open it by Set command. I think the most correct approach is the 
> > platform to verify rather than the engine side. In addition, the current 
> > Connector/UDF/DataStream program also exists using custom Jar case, these 
> > Jar will also have security issues, Flink currently does not provide Option 
> > to prohibit the use of custom Jar. The user used a custom Jar, which means 
> > that the user has permission to do this, then the user should be 
> > responsible for the security of the Jar. If it was hacked, it means that 
> > there are loopholes in the company's permissions/network and they need to 
> > fix these problems. All in all, I agree with you on this point, but Option 
> > can't solve this problem.
> > 
> > Best,
> > 
> > Ron

Reply via email to