> -----原始邮件-----
> 发件人: "Martijn Visser" <mart...@ververica.com>
> 发送时间: 2022-03-24 16:18:14 (星期四)
> 收件人: dev <dev@flink.apache.org>
> 抄送: 
> 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi Ron,
> 
> Thanks for creating the FLIP. You're talking about both local and remote
> resources. With regards to remote resources, how do you see this work with
> Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> dependencies are not packaged, but I would hope that we do that for all
> filesystem implementation. I don't think it's a good idea to have any tight
> coupling to file system implementations, especially if at some point we
> could also externalize file system implementations (like we're doing for
> connectors already). I think the FLIP would be better by not only
> referring to "Hadoop" as a remote resource provider, but a more generic
> term since there are more options than Hadoop.
> 
> I'm also thinking about security/operations implications: would it be
> possible for bad actor X to create a JAR that either influences other
> running jobs, leaks data or credentials or anything else? If so, I think it
> would also be good to have an option to disable this feature completely. I
> think there are roughly two types of companies who run Flink: those who
> open it up for everyone to use (here the feature would be welcomed) and
> those who need to follow certain minimum standards/have a more closed Flink
> ecosystem). They usually want to validate a JAR upfront before making it
> available, even at the expense of speed, because it gives them more control
> over what will be running in their environment.
> 
> Best regards,
> 
> Martijn Visser
> https://twitter.com/MartijnVisser82
> 
> 
> On Wed, 23 Mar 2022 at 16:47, 刘大龙 <ld...@zju.edu.cn> wrote:
> 
> >
> >
> >
> > > -----原始邮件-----
> > > 发件人: "Peter Huang" <huangzhenqiu0...@gmail.com>
> > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > 收件人: dev <dev@flink.apache.org>
> > > 抄送:
> > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > >
> > > Hi Ron,
> > >
> > > Thanks for reviving the discussion of the work. The design looks good. A
> > > small typo in the FLIP is that currently it is marked as released in
> > 1.16.
> > >
> > >
> > > Best Regards
> > > Peter Huang
> > >
> > >
> > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang <zhangma...@163.com> wrote:
> > >
> > > > hi Yuxia,
> > > >
> > > >
> > > > Thanks for your reply. Your reminder is very important !
> > > >
> > > >
> > > > Since we download the file to the local, remember to clean it up when
> > the
> > > > flink client exits
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Mang Zhang
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > <luoyuxia.luoyu...@alibaba-inc.com.INVALID> wrote:
> > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will
> > > > benefit from it. The flip looks good to me. I just have two minor
> > questions:
> > > > >1. For synax explanation, I see it's "Create .... function as
> > > > identifier....", I think the word "identifier" may not be
> > > > self-dedescriptive for actually it's not a random name but the name of
> > the
> > > > class that provides the implementation for function to be create.
> > > > >May be it'll be more clear to use "class_name" replace "identifier"
> > just
> > > > like what Hive[1]/Spark[2] do.
> > > > >
> > > > >2.  >> If the resource used is a remote resource, it will first
> > download
> > > > the resource to a local temporary directory, which will be generated
> > using
> > > > UUID, and then register the local path to the user class loader.
> > > > >For the above explanation in this FLIP, It seems for such statement
> > sets,
> > > > >""
> > > > >Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
> > > > >Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
> > > > >""
> > > > > it'll download the resource 'hdfs://myudfs.jar' for twice. So is it
> > > > possible to provide some cache mechanism that we won't need to
> > download /
> > > > store for twice?
> > > > >
> > > > >
> > > > >Best regards,
> > > > >Yuxia
> > > > >[1]
> > https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > > > >[2]
> > > >
> > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html------------------------------------------------------------------
> > > > >发件人:Mang Zhang<zhangma...@163.com>
> > > > >日 期:2022年03月22日 11:35:24
> > > > >收件人:<dev@flink.apache.org>
> > > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > >
> > > > >Hi Ron, Thank you so much for this suggestion, this is so good.
> > > > >In our company, when users use custom UDF, it is very inconvenient,
> > and
> > > > the code needs to be packaged into the job jar,
> > > > >and cannot refer to the existing udf jar through the existing udf jar.
> > > > >Or pass in the jar reference in the startup command.
> > > > >If we implement this feature, users can focus on their own business
> > > > development.
> > > > >I can also contribute if needed.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >--
> > > > >
> > > > >Best regards,
> > > > >Mang Zhang
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >At 2022-03-21 14:57:32, "刘大龙" <ld...@zju.edu.cn> wrote:
> > > > >>Hi, everyone
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>I would like to open a discussion for support advanced Function DDL,
> > > > this proposal is a continuation of FLIP-79 in which Flink Function DDL
> > is
> > > > defined. Until now it is partially released as the Flink function DDL
> > with
> > > > user defined resources is not clearly discussed and implemented. It is
> > an
> > > > important feature for support to register UDF with custom jar resource,
> > > > users can use UDF more more easily without having to put jars under the
> > > > classpath in advance.
> > > > >>
> > > > >>Looking forward to your feedback.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>[1]
> > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>Best,
> > > > >>
> > > > >>Ron
> > > > >>
> > > > >>
> > > > >
> > > >
> >
> > Hi, Peter, Thanks for your feedback. This work also has your effort, thank
> > you very much.
> >

Hi, Martijn
Thank you very much for the feedback, it was very useful for me.
1. Filesystem abstraction: With regards to remote resources, I agree with you 
that we should use Flink's FileSytem abstraction to supports all types of file 
system, including HTTP, S3, HDFS, etc, rather than binding to a specific 
implementation. Currently in the first version, we will give priority to 
support HDFS as a resource provider by Flink's FileSytem abstraction. HDFS is 
used very much.

2. Security/operations implications: The point you are considering is a great 
one, security is an issue that needs to be considered. Your starting point is 
that Jar needs to have some verification done on it before it is used, to avoid 
some non-secure behavior. However, IMO, the validation of Jar is supposed to be 
done by the platform side itself, and the platform needs to ensure that users 
have permission to use the jar and security of Jar. Option is not able to 
disable the syntax completely, the user can still open it by Set command. I 
think the most correct approach is the platform to verify rather than the 
engine side. In addition, the current Connector/UDF/DataStream program also 
exists using custom Jar case, these Jar will also have security issues, Flink 
currently does not provide Option to prohibit the use of custom Jar. The user 
used a custom Jar, which means that the user has permission to do this, then 
the user should be responsible for the security of the Jar. If it was hacked, 
it means that there are loopholes in the company's permissions/network and they 
need to fix these problems. All in all, I agree with you on this point, but 
Option can't solve this problem.

Best,

Ron

Reply via email to