[jira] [Updated] (FLINK-14319) Register user jar files in {Stream}ExecutionEnvironment

Leo Zhang (Jira) Tue, 08 Oct 2019 17:48:52 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-14319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Leo Zhang updated FLINK-14319:
------------------------------
      Docs Text:   (was: Design
- 1
- 2
- 3)
    Description: 
 I see that there are some use cases in which people want to implement their 
own SQL application based on loading external jars for now. And the related API 
proposals have been issued in the task FLINK-10232 Add a SQL DDL . And the 
related sub-task FLINK-14055 is unresolved and its status is still open.

I feel like it's better to split this task FLINK-14055 into two goals, one for 
DDL and the other new task for 
_{Stream}ExecutionEnvironment::registerUserJarFile()_ interface which will be 
addressed in this issue.    

Here is the plan.

*Design*
 * Add _void_ _registerUserJarFile(String jarFile)_ into 
StreamExecutionEnvironment ( in module flink-streaming-java). The affected 
classes are StreamGraph, StreamGraphGenerator, StreamingJobGraphGenerator to 
support getting and setting a list of user jars.  And all they are in module 
flink-streaming-java.
 * Add _void_ _registerUserJarFile(String jarFile)_ into ExecutionEnvironment 
(in module flink-java). The affected classes is Plan, in module flink-core, to 
support getting and setting a list of user jars. 
 * Add void addUserJars(List<Path> userJars, JobGraph jobGraph) into 
JobGraphGenerator and add the user jars within the method 
compileHobGraph(OptimizedPlan program, JobID jobId) so that user jars can be 
shipped with user's program and submitted to cluster. JobGraphGenerator is in 
module flink-optimizer.
 * Add _void_ _registerUserJarFile(String jarFile)_ into 
\{Stream}ExecutionEnvironment.scala (in module flink-scala and 
flink-streaming-scala) and just use the wrapped javaEnv to achieve 
registration. 

*Testing*
 * One test case for adding local user jars both in the streaming and batch 
jobs. We need to process test classes into a jar before testing. For this 
purpose, we can add a goal in process-test-classes for this testing case in pom 
file. The affected module is flink-tests.
 * Another test case for adding use jars in HDFS. The same idea with the 
previous one. The affected module is flink-fs-tests.
 * Note that python API is not included in this issue just as registering 
cached files. But we still need to modify some python test cases in order to 
avoid building error as lacking some methods declared in java.  The affected 
files are 
flink-python/pyflink/dataset/tests/test_execution_environment_completeness.py 
and 
flink-python/pyflink/datastream/tests/test_stream_execution_environment_completeness.py.

  was:
 I see that there are some use cases in which people want to implement their 
own SQL application based on loading external jars for now. And the related API 
proposals have been issued in the task FLINK-10232 Add a SQL DDL . And the 
related sub-task FLINK-14055 is unresolved and its status is still open. 

I feel like it's better to split this task FLINK-14055 into two goals, one for 
DDL and the other new task for 
_\{Stream\}ExecutionEnvironment::registerUserJarFile()_ interface. 

 I have implemented the interfaces both for java and scala API, and they are 
tested well. And my implementation exactly obeys to the design doc of the  
FLINK-10232 and chooses the first option from design alternatives. So I  wanna 
share my codes if it's ok.

 


> Register user jar files in {Stream}ExecutionEnvironment 
> --------------------------------------------------------
>
>                 Key: FLINK-14319
>                 URL: https://issues.apache.org/jira/browse/FLINK-14319
>             Project: Flink
>          Issue Type: New Feature
>          Components: API / DataSet, API / DataStream
>            Reporter: Leo Zhang
>            Priority: Major
>             Fix For: 1.10.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
>  I see that there are some use cases in which people want to implement their 
> own SQL application based on loading external jars for now. And the related 
> API proposals have been issued in the task FLINK-10232 Add a SQL DDL . And 
> the related sub-task FLINK-14055 is unresolved and its status is still open.
> I feel like it's better to split this task FLINK-14055 into two goals, one 
> for DDL and the other new task for 
> _{Stream}ExecutionEnvironment::registerUserJarFile()_ interface which will be 
> addressed in this issue.    
> Here is the plan.
> *Design*
>  * Add _void_ _registerUserJarFile(String jarFile)_ into 
> StreamExecutionEnvironment ( in module flink-streaming-java). The affected 
> classes are StreamGraph, StreamGraphGenerator, StreamingJobGraphGenerator to 
> support getting and setting a list of user jars.  And all they are in module 
> flink-streaming-java.
>  * Add _void_ _registerUserJarFile(String jarFile)_ into ExecutionEnvironment 
> (in module flink-java). The affected classes is Plan, in module flink-core, 
> to support getting and setting a list of user jars. 
>  * Add void addUserJars(List<Path> userJars, JobGraph jobGraph) into 
> JobGraphGenerator and add the user jars within the method 
> compileHobGraph(OptimizedPlan program, JobID jobId) so that user jars can be 
> shipped with user's program and submitted to cluster. JobGraphGenerator is in 
> module flink-optimizer.
>  * Add _void_ _registerUserJarFile(String jarFile)_ into 
> \{Stream}ExecutionEnvironment.scala (in module flink-scala and 
> flink-streaming-scala) and just use the wrapped javaEnv to achieve 
> registration. 
> *Testing*
>  * One test case for adding local user jars both in the streaming and batch 
> jobs. We need to process test classes into a jar before testing. For this 
> purpose, we can add a goal in process-test-classes for this testing case in 
> pom file. The affected module is flink-tests.
>  * Another test case for adding use jars in HDFS. The same idea with the 
> previous one. The affected module is flink-fs-tests.
>  * Note that python API is not included in this issue just as registering 
> cached files. But we still need to modify some python test cases in order to 
> avoid building error as lacking some methods declared in java.  The affected 
> files are 
> flink-python/pyflink/dataset/tests/test_execution_environment_completeness.py 
> and 
> flink-python/pyflink/datastream/tests/test_stream_execution_environment_completeness.py.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-14319) Register user jar files in {Stream}ExecutionEnvironment

Reply via email to