Re: New SQL execution engine

Roman Kondakov Fri, 27 Sep 2019 02:35:24 -0700

Hi Igor!

In my opinion using Apache Calcite for distributed SQL queryoptimization and planning is much more promising approach than using H2.H2 is not suitable for distributed query execution and also it has verylimited abilities for query optimization. While Apache Calcite is theopen source implementation of Cascade/Volcano query optimizationframework [1,2] (other implementations: MS SQL Server, Greenplum). Themain advantage of this framework is it's extensibility - we can changethe optimizer behavior by simply adding or removing optimization rulesto it. Calcite has a cost based optimizer as well as heuristic one whichcan be useful in some situations.


The main challenges I see here:

1. Implementing the distributed query planning for Apache Calcite (it iswas primarily developed for the single-node query optimization). We canreuse the solution of Apache Drill [3] guys here.

2. We need to implement a new distributed query execution engine. ApacheCalcite is a query planning framework, but not the execution one,besides it has some abilities for executing queries in the single-nodecase.

3. Secondary indexes are not supported by Calcite, so we need toovercome this problem somehow. AFAIK Apache Phoenix [4] guys implementedsupport of the secondary indexes as a sorted materialized views.

4. Apache Calcite is a cost-based optimizer - so we need to create ourown cost model and gather statistics to be able to choose the mosteffective query execution plans.

5. What about deprecating our current query API which has a number ofdrawbacks like using shortcuts `List<?>' as a query result or multipleredundant flags in `SqlFieldsQuery` (collocated, lazy, etc) which areuseless for the new query execution engine?

[1]https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf[2]https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf

[3] https://drill.apache.org/
[4] https://phoenix.apache.org/
--
Kind Regards
Roman Kondakov

On 27.09.2019 11:44, Igor Seliverstov wrote:

Hi Igniters!

As you might know currently we have many open issues relating to current H2 
based engine and its execution flow.

Some of them are critical (like impossibility to execute particular queries), 
some of them are majors (like impossibility to execute particular queries 
without pre-preparation your data to have a collocation) and many minors.

Most of the issues cannot be solved without whole engine redesign.

So, here the proposal: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084

I'll appreciate if you share your thoughts on top of that.

Regards,
Igor

Re: New SQL execution engine

Reply via email to