Hi Igor!

In my opinion using Apache Calcite for distributed SQL query optimization and planning is much more promising approach than using H2. H2 is not suitable for distributed query execution and also it has very limited abilities for query optimization. While Apache Calcite is the open source implementation of Cascade/Volcano query optimization framework [1,2] (other implementations: MS SQL Server, Greenplum). The main advantage of this framework is it's extensibility - we can change the optimizer behavior by simply adding or removing optimization rules to it. Calcite has a cost based optimizer as well as heuristic one which can be useful in some situations.

The main challenges I see here:

1. Implementing the distributed query planning for Apache Calcite (it is was primarily developed for the single-node query optimization). We can reuse the solution of Apache Drill [3] guys here.

2. We need to implement a new distributed query execution engine. Apache Calcite is a query planning framework, but not the execution one, besidesĀ  it has some abilities for executing queries in the single-node case.

3. Secondary indexes are not supported by Calcite, so we need to overcome this problem somehow. AFAIK Apache Phoenix [4] guys implemented support of the secondary indexes as a sorted materialized views.

4. Apache Calcite is a cost-based optimizer - so we need to create our own cost model and gather statistics to be able to choose the most effective query execution plans.

5. What about deprecating our current query API which has a number of drawbacks like using shortcuts `List<?>' as a query result or multiple redundant flags in `SqlFieldsQuery` (collocated, lazy, etc) which are useless for the new query execution engine?

[1] https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf [2] https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
[3] https://drill.apache.org/
[4] https://phoenix.apache.org/
--
Kind Regards
Roman Kondakov

On 27.09.2019 11:44, Igor Seliverstov wrote:
Hi Igniters!

As you might know currently we have many open issues relating to current H2 
based engine and its execution flow.

Some of them are critical (like impossibility to execute particular queries), 
some of them are majors (like impossibility to execute particular queries 
without pre-preparation your data to have a collocation) and many minors.

Most of the issues cannot be solved without whole engine redesign.

So, here the proposal: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084

I'll appreciate if you share your thoughts on top of that.

Regards,
Igor

Reply via email to