[ 
https://issues.apache.org/jira/browse/FLINK-11421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833664#comment-16833664
 ] 

Liya Fan commented on FLINK-11421:
----------------------------------

Hi [~lzljs3620320], thanks a lot for your information. 
Can I restart the PR now? 

My comments in line:

1.Why is Java Compiler faster than Janino? any technical details and evidence?

Generally speaking, if a compiler takes longer time to compile the code, the 
compilation results will have higher quality. This is because, a compiler 
taking longer time usually applies more optimizations to the code. 

Similarly, for native language compilers, like gcc, we have different 
optimization levels, -O0, -O1, -O2, and -O3. A higher compilation level means a 
longer compilation time. However, the generated machine code will have better 
quality.

2.Do some benchmark to measure how fast E2E run after compiling by JCA?

We first found that JCA could improve E2E performance when we were trying to 
support vectorization of TPC-H Q1. JCA compilation provided a performance 
improvement of about 27% (from 27-28s to 20s). This is also witnessed in some 
other TPC-H Queries, like Q12, Q18, etc.

3.Do some benchmark to measure how slowly JCA compiles?

Good question. It takes about 2s to finish a JCA compilation task, which is 
more than 10 times slower than compiling by Janino. To alleviate the 
performance impact, we introduce 2 improvements:

1) Compilation by chain: it seems the compilation time does not increase much 
as the number of source files increases. So we compile the source code for all 
operators in a chain in a single batch.

2) Class cache: once a source file is compiled, we save it into a cache, so 
other tasks in the same JVM can reuse the compilation results.

4.Open the JCA compiler to run all tests of table-planner?

Sounds reasonable. The only drawback can be that, the time for running the 
tests can be much longer, since compiling by JCA is much slower. 

> Add compilation options to allow compiling generated code with JDK compiler 
> ----------------------------------------------------------------------------
>
>                 Key: FLINK-11421
>                 URL: https://issues.apache.org/jira/browse/FLINK-11421
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / Runtime
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>              Labels: pull-request-available
>   Original Estimate: 240h
>          Time Spent: 20m
>  Remaining Estimate: 239h 40m
>
> Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code 
> generation. That is, Flink generates their source code dynamically, and then 
> compile it into Java Byte Code, which is load and executed at runtime.
>  
> By default, Flink compiles the generated source code by Janino. This is fast, 
> as the compilation often finishes in hundreds of milliseconds. The generated 
> Java Byte Code, however, is of poor quality. To illustrate, we use Java 
> Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) 
> queries show that the E2E time can be more than 10% shorter, when operators 
> are compiled by JCA, despite that it takes more time (a few seconds) to 
> compile with JCA.
>  
> Therefore, we believe it is beneficial to compile generated code by JCA in 
> the following scenarios: 1) For batch jobs, the E2E time is relatively long, 
> so it is worth of spending more time compiling and generating high quality 
> Java Byte Code. 2) For repeated stream jobs, the generated code will be 
> compiled once and run many times. Therefore, it pays to spend more time 
> compiling for the first time, and enjoy the high byte code qualities for 
> later runs.
>  
> According to the above observations, we want to provide a compilation option 
> (Janino, JCA, or dynamic) for Flink, so that the user can choose the one 
> suitable for their specific scenario and obtain better performance whenever 
> possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to