Dong Lin created FLINK-26904:
--------------------------------
Summary: Update load(...) of all Stage subclasses to use
StreamTableEnvironment
Key: FLINK-26904
URL: https://issues.apache.org/jira/browse/FLINK-26904
Project: Flink
Issue Type: Bug
Reporter: Dong Lin
Currently every Stage subclass uses static `load(StreamExecutionEnvironment,
String)` to load model data from the given path. Algorithm developers are
expected to use StreamExecutionEnvironment.create(env) to instantiate a new
StreamTableEnvironment and uses it to create Table instances for model data.
This approach is problematic. Use KMeansModel as example. Users will use
KMeansModel::load(env, path) to instantiate the model and call
model.transform(inputDataTable) to do inference, where modelDataTable (created
from load(...)) and inputDataTable are created using different
StreamTableEnvironment instances.
Having multiple Table instances in the same job where instances are created
from different StreamTableEnvironment instances are in general error prone, as
they can not sure information such as table catalog.
In order to fix this problem, we will need to consistently use
StreamTableEnvironment for load(...) and similar public APIs in Flink ML.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)