[jira] [Updated] (FLINK-26904) Update load(...) of all Stage subclasses to use StreamTableEnvironment

Dong Lin (Jira) Mon, 28 Mar 2022 23:33:05 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-26904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dong Lin updated FLINK-26904:
-----------------------------
    Description: 
Currently every Stage subclass uses static `load(StreamExecutionEnvironment, 
String)` to load model data from the given path. Algorithm developers are 
expected to use StreamExecutionEnvironment.create(env) to instantiate a new 
StreamTableEnvironment and uses it to create Table instances for model data.

This approach is problematic. Use KMeansModel as example. Users will use 
KMeansModel::load(env, path) to instantiate the model and call 
model.transform(inputDataTable) to do inference, where modelDataTable (created 
from load(...)) and inputDataTable are created using different 
StreamTableEnvironment instances. 

Having multiple Table instances in the same job where instances are created 
from different StreamTableEnvironment instances are in general error prone, as 
they can not share information such as table catalog.

In order to fix this problem, we will need to consistently use 
StreamTableEnvironment for load(...) and similar public APIs in Flink ML.


  was:
Currently every Stage subclass uses static `load(StreamExecutionEnvironment, 
String)` to load model data from the given path. Algorithm developers are 
expected to use StreamExecutionEnvironment.create(env) to instantiate a new 
StreamTableEnvironment and uses it to create Table instances for model data.

This approach is problematic. Use KMeansModel as example. Users will use 
KMeansModel::load(env, path) to instantiate the model and call 
model.transform(inputDataTable) to do inference, where modelDataTable (created 
from load(...)) and inputDataTable are created using different 
StreamTableEnvironment instances. 

Having multiple Table instances in the same job where instances are created 
from different StreamTableEnvironment instances are in general error prone, as 
they can not sure information such as table catalog.

In order to fix this problem, we will need to consistently use 
StreamTableEnvironment for load(...) and similar public APIs in Flink ML.



> Update load(...) of all Stage subclasses to use StreamTableEnvironment
> ----------------------------------------------------------------------
>
>                 Key: FLINK-26904
>                 URL: https://issues.apache.org/jira/browse/FLINK-26904
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Dong Lin
>            Assignee: Yunfeng Zhou
>            Priority: Major
>
> Currently every Stage subclass uses static `load(StreamExecutionEnvironment, 
> String)` to load model data from the given path. Algorithm developers are 
> expected to use StreamExecutionEnvironment.create(env) to instantiate a new 
> StreamTableEnvironment and uses it to create Table instances for model data.
> This approach is problematic. Use KMeansModel as example. Users will use 
> KMeansModel::load(env, path) to instantiate the model and call 
> model.transform(inputDataTable) to do inference, where modelDataTable 
> (created from load(...)) and inputDataTable are created using different 
> StreamTableEnvironment instances. 
> Having multiple Table instances in the same job where instances are created 
> from different StreamTableEnvironment instances are in general error prone, 
> as they can not share information such as table catalog.
> In order to fix this problem, we will need to consistently use 
> StreamTableEnvironment for load(...) and similar public APIs in Flink ML.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26904) Update load(...) of all Stage subclasses to use StreamTableEnvironment

Reply via email to