[
https://issues.apache.org/jira/browse/FLINK-26904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-26904:
-----------------------------------
Labels: pull-request-available (was: )
> Update load(...) of all Stage subclasses to use StreamTableEnvironment
> ----------------------------------------------------------------------
>
> Key: FLINK-26904
> URL: https://issues.apache.org/jira/browse/FLINK-26904
> Project: Flink
> Issue Type: Bug
> Reporter: Dong Lin
> Assignee: Yunfeng Zhou
> Priority: Major
> Labels: pull-request-available
>
> Currently every Stage subclass uses static `load(StreamExecutionEnvironment,
> String)` to load model data from the given path. Algorithm developers are
> expected to use StreamExecutionEnvironment.create(env) to instantiate a new
> StreamTableEnvironment and uses it to create Table instances for model data.
> This approach is problematic. Use KMeansModel as example. Users will use
> KMeansModel::load(env, path) to instantiate the model and call
> model.transform(inputDataTable) to do inference, where modelDataTable
> (created from load(...)) and inputDataTable are created using different
> StreamTableEnvironment instances.
> Having multiple Table instances in the same job where instances are created
> from different StreamTableEnvironment instances are in general error prone,
> as they can not share information such as table catalog.
> In order to fix this problem, we will need to consistently use
> StreamTableEnvironment for load(...) and similar public APIs in Flink ML.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)