Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

Ron Liu Tue, 06 May 2025 22:57:21 -0700

> It's mainly used for model evaluation purposes for `ML_EVALUATE`.
Different
loss functions will be used and different metrics will be output for
`ML_EVALUATE` based on the task option of the model. Task option is not
necessary if the model
is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading method
which can override the task type during evaluation.


>From your explanation, I personally feel that it might be more appropriate
to replace task with a word more suited to the scenario, but of course I
don't have a good suggestion at the moment, just a suggestion.

Best,
Ron

Hao Li <[email protected]> 于2025年5月7日周三 11:24写道：

> Hi Yunfeng, Ron,
>
> Thanks for the feedback.
>
> >  it might be better to change the configuration api_key to apikey
> Make sense. I updated the FLIP.
>
> > Why is it necessary to define the task option in the WITH clause of the
> Model DDL, and what is its purpose?
> It's mainly used for model evaluation purposes for `ML_EVALUATE`. Different
> loss functions will be used and different metrics will be output for
> `ML_EVALUATE` based on the task option of the model. Task option is not
> necessary if the model
> is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading method
> which can override the task type during evaluation.
>
> Apart from evaluation, in the future, if model training is supported in
> Flink, it can also serve the purpose of how the model can be trained.
>
> > About the CatalogModel interface, why does it need `getInputSchema` and
> `getOutputSchema` methods? What is the role of Schema?
> Schema is mainly to specific the input and output data type of the model
> when it's used in prediction. During prediction, `ML_PREDICT` takes columns
> from the input table matching the models input schema types and output
> columns based on the model's output schema type.
>
> > Regarding the ModelProvider interface, what is the role of the copy
> method?
> I think it can be useful in the future if we need to copy it during the
> planning stage and apply mutations to the provider. But it may not be used
> for now. I'm also ok to remove it.
>
>
> Hope this answers your question.
>
> Thanks,
> Hao
>
>
> On Tue, May 6, 2025 at 7:49 PM Ron Liu <[email protected]> wrote:
>
> > Hi, Hao
> >
> > Thanks for starting this proposal, it's a great feature, +1.
> >
> > Since I was missing some context, I went to FLIP-437. Combining these two
> > FLIPs, I have the following three questions:
> > 1. Why is it necessary to define the task option in the WITH clause of
> the
> > Model DDL, and what is its purpose? I understand that one model can
> support
> > various types of tasks such as regression, classification, clustering,
> > etc... But the example you have given gives me the impression that model
> > can only perform a specific type of task, which confuses me. I think the
> > task option is not needed
> >
> > 2. About the CatalogModel interface, why does it need `getInputSchema`
> and
> > `getOutputSchema` method, What is the role of Schema?
> >
> > 3. Regarding the ModelProvider interface, what is the role of the copy
> > method? Since I don't know much about the implementation details, I'm
> > curious about what cases need to be copied.
> >
> >
> > Best,
> > Ron
> >
> > Yunfeng Zhou <[email protected]> 于2025年5月7日周三 09:33写道：
> >
> > > Hi Hao,
> > >
> > > Thanks for the FLIP! It provides a clearer guideline for developers to
> > > implement model functions.
> > >
> > > One minor comment: it might be better to change the configuration
> api_key
> > > to apikey, which corresponds to GlobalConfiguration.SENSITIVE_KEYS.
> > > Otherwise users’ secrets might be exposed in logs and cause security
> > risks.
> > >
> > > Best,
> > > Yunfeng
> > >
> > >
> > > > 2025年4月29日 07:22，Hao Li <[email protected]> 写道：
> > > >
> > > > Hi All,
> > > >
> > > > I would like to start a discussion about FLIP-525 [1]: Model
> > ML_PREDICT,
> > > > ML_EVALUATE Implementation Design. This FLIP is co-authored with
> > Shengkai
> > > > Fang.
> > > >
> > > > This FLIP is a follow up of FLIP-437 [2] to propose the
> implementation
> > > > design for ML_PREDICT and ML_EVALUATE function which were introduced
> in
> > > > FLIP-437.
> > > >
> > > > For more details, see FLIP-525 [1]. Looking forward to your feedback.
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > > > [2]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL
> > > >
> > > >
> > > > Thanks,
> > > > Hao
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

Reply via email to