Re:Re: Re:Re: [Discussion] - About an idea to add a UDF management module for StreamPark

tiger Sun, 09 Oct 2022 22:57:20 -0700

Hi fanrui,


Issue has been created, please refer here , welcome to continue to improve.

https://github.com/apache/incubator-streampark/issues/1782














At 2022-10-10 12:56:20, "Rui Fan" <[email protected]> wrote:
>Hi tiger:
>
>> The e-mail does not show the picture, reapply it as an attachment.
>
>Thanks for your discussion. Could you create an issue first? And
>add all background, motivation and solutions in that issue.
>
>Best,
>fanrui
>
>On Mon, Oct 10, 2022 at 12:32 PM tiger <[email protected]> wrote:
>
>> The e-mail does not show the picture, reapply it as an attachment.
>>
>>
>>
>> At 2022-10-10 12:23:48, "tiger" <[email protected]> wrote:
>>
>> hi huajie,
>>
>> Nice to receive a reply, so I'll share my thoughts next
>>
>>
>>    - *Is the UDF management module just a simple CRUD module?*
>>
>> Personally, I think so: it has CRUD functionality, but the UDF module is
>> designed primarily to be user-friendly.
>> Users may create many UDFs, but over time, they may forget some
>> information (e.g., function name, class corresponding to the function,
>> storage path, etc.), and with this module these problems can be solved; at
>> the same time, when creating a job, you can also choose which UDF to use
>> (refer to the next point ), which eliminates the need to upload this step
>> and is more convenient.
>>
>>
>>    - *How does it work with the user's job?*
>>
>> The current plan is mainly based on the yarn application model, so the
>> following is mainly an example of how to use UDF.
>>
>>    1. When creating a job, select the required UDF (e.g., a drop-down box
>>    showing the UDF available to the current user, associated with udfId);
>>    2. When starting a job, it will query the paths of these udf stores
>>    according to the selected udfId (there can be more than one), and at the
>>    same time stitch these storage paths into strings, and finally pass them
>>    into yarn.provided.lib.dirs when submitting the job to achieve dynamic
>>    loading.
>>
>> *UDF Select Box UI Example:*
>> *Example of sql using udf *:
>> refer:
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/create/#create-function
>>
>> *yarn.provided.lib.dirs*:
>> refer:
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#yarn-provided-lib-dirs
>>
>>
>>    - *About compatibility with other deployment models*
>>
>> k8s, standalone and other modes, speakinsg frankly，there is not much idea
>> yet, plan to study later, but also very welcome to work with everyone to
>> improve these features.
>>
>>
>>
>>
>>
>>
>> At 2022-10-09 17:32:05, "Huajie Wang" <[email protected]> wrote:
>> >hi tiger:
>> >
>> >Thanks for starting a valuable discussion, If udf is only a management
>> >module(CURD), that's easy, The key is in multiple deployment modes (on
>> >yarn|k8s|standalone...) How these udfs work together with the user's job?
>> >This is a difficult problem. Do you have any relevant ideas and designs for
>> >this?
>> >
>> >
>> >Best,
>> >Huajie Wang
>> >
>> >
>> >
>> >tiger <[email protected]> 于2022年10月9日周日 17:18写道：
>> >
>> >> Hello everyone
>> >>
>> >>
>> >> As previously discussed in the group, an issue has been created over here
>> >> and suggestions are welcome.
>> >>
>> >>
>> >> Regarding the development of specific features, as I don't have permission
>> >> to create a branch, could @Huajie help to create a new branch based on the
>> >> 1.2.3-release branch? For example udf-management to facilitate 
>> >> development.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2022-10-08 17:18:35, "功夫熊猫" <[email protected]> wrote:
>> >> >Hi all,
>> >> >
>> >> >BackGround:I've been in contact with StreamPark for a while, and I've had
>> >> a pretty good experience in terms of ease of use and stability. At 
>> >> present,
>> >> StreamPark itself supports UDF functions, but it seems that there is no
>> >> unified management menu for UDF, so I would like to add a new menu for UDF
>> >> management, which is used for the management of UDF.
>> >> >
>> >> >Main implementation ideas：
>> >> >Currently, we mainly create UDF through restful api, then select UDF when
>> >> creating the job, and associate UDF ids (mainly to get UDF JAR storage 
>> >> path
>> >> later), and finally achieve dynamic loading through yarn.provided.lib.dirs
>> >> parameter.
>> >> >Note: This feature is currently only implemented based on SQL jobs in
>> >> Yarn Application mode; the JAR is saved on top of HDFS.
>> >> >
>> >> >
>> >> >Main APIs:
>> >> >Add UDF
>> >> >
>> >> >Query UDF (list)
>> >> >Edit UDF
>> >> >Delete UDF
>> >> >
>> >> >
>> >> >Follow up plan:
>> >> >Basic functional development at the API level is implemented first,
>> >> followed by front-end UI-related development.
>> >> >
>> >> >
>> >> >Best wishes
>> >> >tiger
>> >>
>>
>>

Re:Re: Re:Re: [Discussion] - About an idea to add a UDF management module for StreamPark

Reply via email to