[ 
https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther updated FLINK-11421:
---------------------------------
    Component/s:     (was: Core)
                 Table API & SQL

> Providing more compilation options for code-generated operators
> ---------------------------------------------------------------
>
>                 Key: FLINK-11421
>                 URL: https://issues.apache.org/jira/browse/FLINK-11421
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code 
> generation. That is, Flink generates their source code dynamically, and then 
> compile it into Java Byte Code, which is load and executed at runtime.
>  
> By default, Flink compiles the generated source code by Janino. This is fast, 
> as the compilation often finishes in hundreds of milliseconds. The generated 
> Java Byte Code, however, is of poor quality. To illustrate, we use Java 
> Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) 
> queries show that the E2E time can be more than 10% shorter, when operators 
> are compiled by JCA, despite that it takes more time (a few seconds) to 
> compile with JCA.
>  
> Therefore, we believe it is beneficial to compile generated code by JCA in 
> the following scenarios: 1) For batch jobs, the E2E time is relatively long, 
> so it is worth of spending more time compiling and generating high quality 
> Java Byte Code. 2) For repeated stream jobs, the generated code will be 
> compiled once and run many times. Therefore, it pays to spend more time 
> compiling for the first time, and enjoy the high byte code qualities for 
> later runs.
>  
> According to the above observations, we want to provide a compilation option 
> (Janino, JCA, or dynamic) for Flink, so that the user can choose the one 
> suitable for their specific scenario and obtain better performance whenever 
> possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to