Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

Hao Li Tue, 06 May 2025 20:24:13 -0700

Hi Yunfeng, Ron,

Thanks for the feedback.


>  it might be better to change the configuration api_key to apikey
Make sense. I updated the FLIP.

> Why is it necessary to define the task option in the WITH clause of the
Model DDL, and what is its purpose?
It's mainly used for model evaluation purposes for `ML_EVALUATE`. Different
loss functions will be used and different metrics will be output for
`ML_EVALUATE` based on the task option of the model. Task option is not
necessary if the model
is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading method
which can override the task type during evaluation.

Apart from evaluation, in the future, if model training is supported in
Flink, it can also serve the purpose of how the model can be trained.

> About the CatalogModel interface, why does it need `getInputSchema` and
`getOutputSchema` methods? What is the role of Schema?
Schema is mainly to specific the input and output data type of the model
when it's used in prediction. During prediction, `ML_PREDICT` takes columns
from the input table matching the models input schema types and output
columns based on the model's output schema type.

> Regarding the ModelProvider interface, what is the role of the copy
method?
I think it can be useful in the future if we need to copy it during the
planning stage and apply mutations to the provider. But it may not be used
for now. I'm also ok to remove it.


Hope this answers your question.

Thanks,
Hao


On Tue, May 6, 2025 at 7:49 PM Ron Liu <ron9....@gmail.com> wrote:

> Hi, Hao
>
> Thanks for starting this proposal, it's a great feature, +1.
>
> Since I was missing some context, I went to FLIP-437. Combining these two
> FLIPs, I have the following three questions:
> 1. Why is it necessary to define the task option in the WITH clause of the
> Model DDL, and what is its purpose? I understand that one model can support
> various types of tasks such as regression, classification, clustering,
> etc... But the example you have given gives me the impression that model
> can only perform a specific type of task, which confuses me. I think the
> task option is not needed
>
> 2. About the CatalogModel interface, why does it need `getInputSchema` and
> `getOutputSchema` method, What is the role of Schema?
>
> 3. Regarding the ModelProvider interface, what is the role of the copy
> method? Since I don't know much about the implementation details, I'm
> curious about what cases need to be copied.
>
>
> Best,
> Ron
>
> Yunfeng Zhou <flink.zhouyunf...@gmail.com> 于2025年5月7日周三 09:33写道：
>
> > Hi Hao,
> >
> > Thanks for the FLIP! It provides a clearer guideline for developers to
> > implement model functions.
> >
> > One minor comment: it might be better to change the configuration api_key
> > to apikey, which corresponds to GlobalConfiguration.SENSITIVE_KEYS.
> > Otherwise users’ secrets might be exposed in logs and cause security
> risks.
> >
> > Best,
> > Yunfeng
> >
> >
> > > 2025年4月29日 07:22，Hao Li <h...@confluent.io.INVALID> 写道：
> > >
> > > Hi All,
> > >
> > > I would like to start a discussion about FLIP-525 [1]: Model
> ML_PREDICT,
> > > ML_EVALUATE Implementation Design. This FLIP is co-authored with
> Shengkai
> > > Fang.
> > >
> > > This FLIP is a follow up of FLIP-437 [2] to propose the implementation
> > > design for ML_PREDICT and ML_EVALUATE function which were introduced in
> > > FLIP-437.
> > >
> > > For more details, see FLIP-525 [1]. Looking forward to your feedback.
> > >
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL
> > >
> > >
> > > Thanks,
> > > Hao
> >
> >
>

Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

Reply via email to