Hi Peter,

I’d like to share some thoughts from mysids:
1. what's the syntax to distinguish function language ?
        +1 for using `[LANGUAGE JVM|PYTHON] USING JAR`
2. How to persist function language in backend catalog ?
        + 1 for a separate field in CatalogFunction. But as to specific 
backend, we may persist it case by case. Special case includes how HiveCatalog 
store the kind of CatalogFucnction.
3. do we really need to allow users set a properties map for a udf?
    There are use case requiring passing external arguments to udf for sure, 
but the need can also be met by passing arguments to `eval` when calling udf in 
sql. 
IMO, there is not much need to support set properties map for a udf.

4. Should a catalog implement to be able to decide whether it can take a 
properties map, and which language of a udf it can persist?
IMO, it’s necessary for catalog implementation to provide such information. But 
for flink 1.10 map goal, we can just skip this part.



Best,
Terry Wang



> 2019年10月30日 13:52,Peter Huang <huangzhenqiu0...@gmail.com> 写道:
> 
> Hi Bowen,
> 
> I can't agree more about we first have an agreement on the DDL syntax and
> focus on the MVP in the current phase.
> 
> 1) what's the syntax to distinguish function language
> Currently, there are two opinions:
> 
>   - USING 'python .....'
>   - [LANGUAGE JVM|PYTHON] USING JAR '...'
> 
> As we need to support multiple resources as HQL, we shouldn't repeat the
> language symbol as a suffix of each resource.
> I would prefer option two, but definitely open to more comments.
> 
> 2) How to persist function language in backend catalog? as a k-v pair in
> properties map, or a dedicate field?
> Even though language type is also a property, I think a separate field in
> CatalogFunction is a more clean solution.
> 
> 3) do we really need to allow users set a properties map for udf? what needs
> to be stored there? what are they used for?
> 
> I am considering a type of use case that use UDFS for realtime inference.
> The model is nested in the udf as a resource. But there are
> multiple parameters are customizable. In this way, user can use properties
> to define those parameters.
> 
> I only have answers to these questions. For questions about the catalog
> implementation, I hope we can collect more feedback from the community.
> 
> 
> Best Regards
> Peter Huang
> 
> 
> 
> 
> 
> Best Regards
> Peter Huang
> 
> On Tue, Oct 29, 2019 at 11:31 AM Bowen Li <bowenl...@gmail.com> wrote:
> 
>> Hi all,
>> 
>> Besides all the good questions raised above, we seem all agree to have a
>> MVP for Flink 1.10, "to support users to create and persist a java
>> class-based udf that's already in classpath (no extra resource loading),
>> and use it later in queries".
>> 
>> IIUIC, to achieve that in 1.10, the following are currently the core
>> issues/blockers we should figure out, and solve them as our **highest
>> priority**:
>> 
>> - what's the syntax to distinguish function language (java, scala, python,
>> etc)? we only need to implement the java one in 1.10 but have to settle
>> down the long term solution
>> - how to persist function language in backend catalog? as a k-v pair in
>> properties map, or a dedicate field?
>> - do we really need to allow users set a properties map for udf? what needs
>> to be stored there? what are they used for?
>> - should a catalog impl be able to decide whether it can take a properties
>> map (if we decide to have one), and which language of a udf it can persist?
>>   - E.g. Hive metastore, which backs Flink's HiveCatalog, cannot take a
>> properties map and is only able to persist java udf [1], unless we do
>> something hacky to it
>> 
>> I feel these questions are essential to Flink functions in the long run,
>> but most importantly, are also the minimum scope for Flink 1.10. Aspects
>> like resource loading security or compatibility with Hive syntax are
>> important too, however if we focus on them now, we may not be able to get
>> the MVP out in time.
>> 
>> [1]
>> -
>> 
>> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/Function.html
>> -
>> 
>> https://hive.apache.org/javadocs/r3.1.2/api/org/apache/hadoop/hive/metastore/api/FunctionType.html
>> 
>> 
>> 
>> On Sun, Oct 27, 2019 at 8:22 PM Peter Huang <huangzhenqiu0...@gmail.com>
>> wrote:
>> 
>>> Hi Timo,
>>> 
>>> Thanks for the feedback. I replied and adjust the design accordingly. For
>>> the concern of class loading.
>>> I think we need to distinguish the function class loading for Temporary
>> and
>>> Permanent function.
>>> 
>>> 1) For Permanent function, we can add it to the job graph so that we
>> don't
>>> need to load it multiple times for the different sessions.
>>> 2) For Temporary function, we can register function with a session key,
>> and
>>> use different class loaders in RuntimeContext implementation.
>>> 
>>> I added more description in the doc. Please review it again.
>>> 
>>> 
>>> Best Regards
>>> Peter Huang
>>> 
>>> 
>>> 
>>> 
>>> On Thu, Oct 24, 2019 at 2:14 AM Timo Walther <twal...@apache.org> wrote:
>>> 
>>>> Hi Peter,
>>>> 
>>>> thanks for your proposal. I left some comments in the FLIP document. I
>>>> agree with Terry that we can have a MVP in Flink 1.10 but should
>> already
>>>> discuss the bigger picture as a DDL string cannot be changed easily
>> once
>>>> released.
>>>> 
>>>> In particular we should discuss how resources for function are loaded.
>>>> If they are simply added to the JobGraph they are available to all
>>>> functions and could potentially interfere with each other, right?
>>>> 
>>>> Thanks,
>>>> Timo
>>>> 
>>>> 
>>>> 
>>>> On 24.10.19 05:32, Terry Wang wrote:
>>>>> Hi Peter,
>>>>> 
>>>>> Sorry late to reply. Thanks for your efforts on this and I just
>> looked
>>>> through your design.
>>>>> I left some comments in the doc about alter function section and
>>>> function catalog interface.
>>>>> IMO, the overall design is ok and we can discuss further more about
>>> some
>>>> details.
>>>>> I also think it’s necessary to have this awesome feature limit to
>> basic
>>>> function (of course better to have all :) ) in 1.10 release.
>>>>> 
>>>>> Best,
>>>>> Terry Wang
>>>>> 
>>>>> 
>>>>> 
>>>>>> 2019年10月16日 14:19,Peter Huang <huangzhenqiu0...@gmail.com> 写道:
>>>>>> 
>>>>>> Hi Xuefu,
>>>>>> 
>>>>>> Thank you for the feedback. I think you are pointing out a similar
>>>> concern
>>>>>> with Bowen. Let me describe
>>>>>> how the catalog function and function factory will be changed in the
>>>>>> implementation section.
>>>>>> Then, we can have more discussion in detail.
>>>>>> 
>>>>>> 
>>>>>> Best Regards
>>>>>> Peter Huang
>>>>>> 
>>>>>> On Tue, Oct 15, 2019 at 4:18 PM Xuefu Z <usxu...@gmail.com> wrote:
>>>>>> 
>>>>>>> Thanks to Peter for the proposal!
>>>>>>> 
>>>>>>> I left some comments in the google doc. Besides what Bowen pointed
>>>> out, I'm
>>>>>>> unclear about how things  work end to end from the document. For
>>>> instance,
>>>>>>> SQL DDL-like function definition is mentioned. I guess just having
>> a
>>>> DDL
>>>>>>> for it doesn't explain how it's supported functionally. I think
>> it's
>>>> better
>>>>>>> to have some clarification on what is expected work and what's for
>>> the
>>>>>>> future.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Xuefu
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Oct 15, 2019 at 11:05 AM Bowen Li <bowenl...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>>> Hi Zhenqiu,
>>>>>>>> 
>>>>>>>> Thanks for taking on this effort!
>>>>>>>> 
>>>>>>>> A couple questions:
>>>>>>>> - Though this FLIP is about function DDL, can we also think about
>>> how
>>>> the
>>>>>>>> created functions can be mapped to CatalogFunction and see if we
>>> need
>>>> to
>>>>>>>> modify CatalogFunction interface? Syntax changes need to be backed
>>> by
>>>> the
>>>>>>>> backend.
>>>>>>>> - Can we define a clearer, smaller scope targeting for Flink 1.10
>>>> among
>>>>>>> all
>>>>>>>> the proposed changes? The current overall scope seems to be quite
>>>> wide,
>>>>>>> and
>>>>>>>> it may be unrealistic to get everything in a single release, or
>>> even a
>>>>>>>> couple. However, I believe the most common user story can be
>>>> something as
>>>>>>>> simple as "being able to create and persist a java class-based udf
>>> and
>>>>>>> use
>>>>>>>> it later in queries", which will add great value for most Flink
>>> users
>>>> and
>>>>>>>> is achievable in 1.10.
>>>>>>>> 
>>>>>>>> Bowen
>>>>>>>> 
>>>>>>>> On Sun, Oct 13, 2019 at 10:46 PM Peter Huang <
>>>> huangzhenqiu0...@gmail.com
>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Dear Community,
>>>>>>>>> 
>>>>>>>>> FLIP-79 Flink Function DDL Support
>>>>>>>>> <
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>> 
>>> 
>> https://docs.google.com/document/d/16kkHlis80s61ifnIahCj-0IEdy5NJ1z-vGEJd_JuLog/edit#
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> This proposal aims to support function DDL with the consideration
>>> of
>>>>>>> SQL
>>>>>>>>> syntax, language compliance, and advanced external UDF lib
>>>>>>> registration.
>>>>>>>>> The Flink DDL is initialized and discussed in the design
>>>>>>>>> <
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>> 
>>> 
>> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit#heading=h.wpsqidkaaoil
>>>>>>>>>> 
>>>>>>>>> [1] by Shuyi Chen and Timo. As the initial discussion mainly
>>> focused
>>>> on
>>>>>>>> the
>>>>>>>>> table, type and view. FLIP-69 [2] extend it with a more detailed
>>>>>>>> discussion
>>>>>>>>> of DDL for catalog, database, and function. Original the function
>>> DDL
>>>>>>> was
>>>>>>>>> under the scope of FLIP-69. After some discussion
>>>>>>>>> <https://issues.apache.org/jira/browse/FLINK-7151> with the
>>>> community,
>>>>>>>> we
>>>>>>>>> found that there are several ongoing efforts, such as FLIP-64
>> [3],
>>>>>>>> FLIP-65
>>>>>>>>> [4], and FLIP-78 [5]. As they will directly impact the SQL syntax
>>> of
>>>>>>>>> function DDL, the proposal wants to describe the problem clearly
>>> with
>>>>>>> the
>>>>>>>>> consideration of existing works and make sure the design aligns
>>> with
>>>>>>>>> efforts of API change of temporary objects and type inference for
>>> UDF
>>>>>>>>> defined by different languages.
>>>>>>>>> 
>>>>>>>>> The FlLIP outlines the requirements from related works, and
>>> propose a
>>>>>>> SQL
>>>>>>>>> syntax to meet those requirements. The corresponding
>> implementation
>>>> is
>>>>>>>> also
>>>>>>>>> discussed. Please kindly review and give feedback.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best Regards
>>>>>>>>> Peter Huang
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Xuefu Zhang
>>>>>>> 
>>>>>>> "In Honey We Trust!"
>>>>>>> 
>>>> 
>>>> 
>>> 
>> 

Reply via email to