[ 
https://issues.apache.org/jira/browse/FLINK-26904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated FLINK-26904:
-----------------------------
    Fix Version/s: ml-2.1.0

> Update load(...) of all Stage subclasses to use StreamTableEnvironment
> ----------------------------------------------------------------------
>
>                 Key: FLINK-26904
>                 URL: https://issues.apache.org/jira/browse/FLINK-26904
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Dong Lin
>            Assignee: Yunfeng Zhou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: ml-2.1.0
>
>
> Currently every Stage subclass uses static `load(StreamExecutionEnvironment, 
> String)` to load model data from the given path. Algorithm developers are 
> expected to use StreamExecutionEnvironment.create(env) to instantiate a new 
> StreamTableEnvironment and uses it to create Table instances for model data.
> This approach is problematic. Use KMeansModel as example. Users will use 
> KMeansModel::load(env, path) to instantiate the model and call 
> model.transform(inputDataTable) to do inference, where modelDataTable 
> (created from load(...)) and inputDataTable are created using different 
> StreamTableEnvironment instances. 
> Having multiple Table instances in the same job where instances are created 
> from different StreamTableEnvironment instances are in general error prone, 
> as they can not share information such as table catalog.
> In order to fix this problem, we will need to consistently use 
> StreamTableEnvironment for load(...) and similar public APIs in Flink ML.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to