Re: Re:Re: [Discussion] - About an idea to add a UDF management module for StreamPark

Rui Fan Sun, 09 Oct 2022 21:56:46 -0700

Hi tiger:

> The e-mail does not show the picture, reapply it as an attachment.


Thanks for your discussion. Could you create an issue first? And
add all background, motivation and solutions in that issue.

Best,
fanrui

On Mon, Oct 10, 2022 at 12:32 PM tiger <[email protected]> wrote:

> The e-mail does not show the picture, reapply it as an attachment.
>
>
>
> At 2022-10-10 12:23:48, "tiger" <[email protected]> wrote:
>
> hi huajie,
>
> Nice to receive a reply, so I'll share my thoughts next
>
>
>    - *Is the UDF management module just a simple CRUD module?*
>
> Personally, I think so: it has CRUD functionality, but the UDF module is
> designed primarily to be user-friendly.
> Users may create many UDFs, but over time, they may forget some
> information (e.g., function name, class corresponding to the function,
> storage path, etc.), and with this module these problems can be solved; at
> the same time, when creating a job, you can also choose which UDF to use
> (refer to the next point ), which eliminates the need to upload this step
> and is more convenient.
>
>
>    - *How does it work with the user's job?*
>
> The current plan is mainly based on the yarn application model, so the
> following is mainly an example of how to use UDF.
>
>    1. When creating a job, select the required UDF (e.g., a drop-down box
>    showing the UDF available to the current user, associated with udfId);
>    2. When starting a job, it will query the paths of these udf stores
>    according to the selected udfId (there can be more than one), and at the
>    same time stitch these storage paths into strings, and finally pass them
>    into yarn.provided.lib.dirs when submitting the job to achieve dynamic
>    loading.
>
> *UDF Select Box UI Example:*
> *Example of sql using udf *:
> refer:
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/create/#create-function
>
> *yarn.provided.lib.dirs*:
> refer:
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#yarn-provided-lib-dirs
>
>
>    - *About compatibility with other deployment models*
>
> k8s, standalone and other modes, speakinsg frankly，there is not much idea
> yet, plan to study later, but also very welcome to work with everyone to
> improve these features.
>
>
>
>
>
>
> At 2022-10-09 17:32:05, "Huajie Wang" <[email protected]> wrote:
> >hi tiger:
> >
> >Thanks for starting a valuable discussion, If udf is only a management
> >module(CURD), that's easy, The key is in multiple deployment modes (on
> >yarn|k8s|standalone...) How these udfs work together with the user's job?
> >This is a difficult problem. Do you have any relevant ideas and designs for
> >this?
> >
> >
> >Best,
> >Huajie Wang
> >
> >
> >
> >tiger <[email protected]> 于2022年10月9日周日 17:18写道：
> >
> >> Hello everyone
> >>
> >>
> >> As previously discussed in the group, an issue has been created over here
> >> and suggestions are welcome.
> >>
> >>
> >> Regarding the development of specific features, as I don't have permission
> >> to create a branch, could @Huajie help to create a new branch based on the
> >> 1.2.3-release branch? For example udf-management to facilitate development.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> At 2022-10-08 17:18:35, "功夫熊猫" <[email protected]> wrote:
> >> >Hi all,
> >> >
> >> >BackGround:I've been in contact with StreamPark for a while, and I've had
> >> a pretty good experience in terms of ease of use and stability. At present,
> >> StreamPark itself supports UDF functions, but it seems that there is no
> >> unified management menu for UDF, so I would like to add a new menu for UDF
> >> management, which is used for the management of UDF.
> >> >
> >> >Main implementation ideas：
> >> >Currently, we mainly create UDF through restful api, then select UDF when
> >> creating the job, and associate UDF ids (mainly to get UDF JAR storage path
> >> later), and finally achieve dynamic loading through yarn.provided.lib.dirs
> >> parameter.
> >> >Note: This feature is currently only implemented based on SQL jobs in
> >> Yarn Application mode; the JAR is saved on top of HDFS.
> >> >
> >> >
> >> >Main APIs:
> >> >Add UDF
> >> >
> >> >Query UDF (list)
> >> >Edit UDF
> >> >Delete UDF
> >> >
> >> >
> >> >Follow up plan:
> >> >Basic functional development at the API level is implemented first,
> >> followed by front-end UI-related development.
> >> >
> >> >
> >> >Best wishes
> >> >tiger
> >>
>
>

Re: Re:Re: [Discussion] - About an idea to add a UDF management module for StreamPark

Reply via email to