Hi Ron,

I think this FLIP would help to improve the performance, looking forward to its 
completion in Flink!

For the state compatibility session, it seems that the checkpoint compatibility 
would be broken just like [1] did. Could FLIP-190 [2] still be helpful in this 
case for SQL version upgrades?


[1] 
https://docs.google.com/document/d/1qKVohV12qn-bM51cBZ8Hcgp31ntwClxjoiNBUOqVHsI/edit#heading=h.fri5rtpte0si
[2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=191336489

Best
Yun Tang

________________________________
From: Lincoln Lee <lincoln.8...@gmail.com>
Sent: Monday, June 5, 2023 10:56
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

Hi Ron

OFGC looks like an exciting optimization, looking forward to its completion
in Flink!
A small question, do we consider adding a benchmark for the operators to
intuitively understand the improvement brought by each improvement?
In addition, for the implementation plan, mentioned in the FLIP that 1.18
will support Calc, HashJoin and HashAgg, then what will be the next step?
and which operators do we ultimately expect to cover (all or specific ones)?

Best,
Lincoln Lee


liu ron <ron9....@gmail.com> 于2023年6月5日周一 09:40写道:

> Hi, Jark
>
> Thanks for your feedback, according to my initial assessment, the work
> effort is relatively large.
>
> Moreover, I will add a test result of all queries to the FLIP.
>
> Best,
> Ron
>
> Jark Wu <imj...@gmail.com> 于2023年6月1日周四 20:45写道:
>
> > Hi Ron,
> >
> > Thanks a lot for the great proposal. The FLIP looks good to me in
> general.
> > It looks like not an easy work but the performance sounds promising. So I
> > think it's worth doing.
> >
> > Besides, if there is a complete test graph with all TPC-DS queries, the
> > effect of this FLIP will be more intuitive.
> >
> > Best,
> > Jark
> >
> >
> >
> > On Wed, 31 May 2023 at 14:27, liu ron <ron9....@gmail.com> wrote:
> >
> > > Hi, Jinsong
> > >
> > > Thanks for your valuable suggestions.
> > >
> > > Best,
> > > Ron
> > >
> > > Jingsong Li <jingsongl...@gmail.com> 于2023年5月30日周二 13:22写道:
> > >
> > > > Thanks Ron for your information.
> > > >
> > > > I suggest that it can be written in the Motivation of FLIP.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Tue, May 30, 2023 at 9:57 AM liu ron <ron9....@gmail.com> wrote:
> > > > >
> > > > > Hi, Jingsong
> > > > >
> > > > > Thanks for your review. We have tested it in TPC-DS case, and got a
> > 12%
> > > > > gain overall when only supporting only Calc&HashJoin&HashAgg
> > operator.
> > > In
> > > > > some queries, we even get more than 30% gain, it looks like  an
> > > effective
> > > > > way.
> > > > >
> > > > > Best,
> > > > > Ron
> > > > >
> > > > > Jingsong Li <jingsongl...@gmail.com> 于2023年5月29日周一 14:33写道:
> > > > >
> > > > > > Thanks Ron for the proposal.
> > > > > >
> > > > > > Do you have some benchmark results for the performance
> > improvement? I
> > > > > > am more concerned about the improvement on Flink than the data in
> > > > > > other papers.
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Mon, May 29, 2023 at 2:16 PM liu ron <ron9....@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > Hi, dev
> > > > > > >
> > > > > > > I'd like to start a discussion about FLIP-315: Support Operator
> > > > Fusion
> > > > > > > Codegen for Flink SQL[1]
> > > > > > >
> > > > > > > As main memory grows, query performance is more and more
> > determined
> > > > by
> > > > > > the
> > > > > > > raw CPU costs of query processing itself, this is due to the
> > query
> > > > > > > processing techniques based on interpreted execution shows poor
> > > > > > performance
> > > > > > > on modern CPUs due to lack of locality and frequent instruction
> > > > > > > mis-prediction. Therefore, the industry is also researching how
> > to
> > > > > > improve
> > > > > > > engine performance by increasing operator execution efficiency.
> > In
> > > > > > > addition, during the process of optimizing Flink's performance
> > for
> > > > TPC-DS
> > > > > > > queries, we found that a significant amount of CPU time was
> spent
> > > on
> > > > > > > virtual function calls, framework collector calls, and invalid
> > > > > > > calculations, which can be optimized to improve the overall
> > engine
> > > > > > > performance. After some investigation, we found Operator Fusion
> > > > Codegen
> > > > > > > which is proposed by Thomas Neumann in the paper[2] can address
> > > these
> > > > > > > problems. I have finished a PoC[3] to verify its feasibility
> and
> > > > > > validity.
> > > > > > >
> > > > > > > Looking forward to your feedback.
> > > > > > >
> > > > > > > [1]:
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > > > > > > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
> > > > > > > [3]: https://github.com/lsyldliu/flink/tree/OFCG
> > > > > > >
> > > > > > > Best,
> > > > > > > Ron
> > > > > >
> > > >
> > >
> >
>

Reply via email to