Re: New SQL execution engine

Seliverstov Igor Fri, 27 Sep 2019 09:03:19 -0700

Nikolay,

> What project hosted Calcite based engine?



Currently the prototype is placed in my personal Ignite fork. I need an 
appropriate ticket before pushing it to ASF git repository. 
At first, I think, we should discuss the idea in general.

> Personally, I'm against the support of two independent implementation of SQL 
> engine for several releases.


I don’t like the idea to have two engines too. But even development the engine 
on top of Calcite library is still a big deal. 
I not sure it will be ready, no, I sure it WONT be ready by Ignite3 release. So 
I mentioned the option to have two engines at the same time.

> Let's start with the IEP clarification and replace the SQL engine with the 
> best one for Ignite good.

Of course, but anyway it’s good to make familiar with a couple of examples it 
already describes and clarify some additional questions the community may ask.

Regards,
Igor

> 27 сент. 2019 г., в 18:22, Nikolay Izhikov <[email protected]> написал(а):
> 
> Igor.
> 
>> There is no decision, here we should decide.
> 
> Great.
> 
>> At now Calcite based engine is placed in different module
> 
> What project hosted Calcite based engine?
> 
>> It’s possible to develop it as an experimental extension at first (not a 
>> replacement)
> 
> For me, Ignite 3 are the place where the new engine has to be placed.
> Personally, I'm against the support of two independent implementation of SQL 
> engine for several releases.
> 
> Ignite has too many partially implemented features to include on more :)
> 
> Let's start with the IEP clarification and replace the SQL engine with the 
> best one for Ignite good.
> 
> 
> В Пт, 27/09/2019 в 18:08 +0300, Seliverstov Igor пишет:
>> Nikolay,
>> 
>> At last we have better questions.
>> 
>> There is no decision, here we should decide.
>> 
>> Doing nothing isn’t a decision, it’s just doing nothing
>> 
>> Spark Catalyst is a good example, but under the hood it has absolutely the 
>> same idea, but adopted to Spark. Calcite is the same, but general. That’s 
>> why it’s better start point.
>> 
>> Implementing an engine from scratch is really cool, but looks like inventing 
>> a bicycle, don’t think it makes sense. At least I against this option.
>> 
>> I added requirements to IEP (as you asked), you may see it’s in DRAFT state 
>> and will be complemented by details.
>> 
>> We have some thoughts on how to make smooth replacement, but at first we 
>> should decide what to replace and what with.
>> 
>> At now Calcite based engine is placed in different module, we checked it can 
>> build execution graph for both local and distributed cases, it has good 
>> expandability. 
>> We talked to Calcite community to identify possible future issues and 
>> everything points to the fact it’s the best option. 
>> It’s possible to develop it as an experimental extension at first (not a 
>> replacement) until we make sure that it works as expected. This way there 
>> are no risks for anybody who uses Ignite on production environment.
>> 
>> Regards,
>> Igor
>> 
>> 
>>> 27 сент. 2019 г., в 17:25, Nikolay Izhikov <[email protected]> написал(а):
>>> 
>>> Igor.
>>> 
>>>> The main issue - there is no *selection*.
>>> 
>>> 1. I don't remember community decision about this.
>>> 
>>> 2. We should avoid to make such long-term decision so quickly.
>>> We done this kind of decision with H2 and come to the point when we should 
>>> review it.
>>> 
>>>> 1) Implementing white papers from scratch
>>>> 2) Adopting Calcite to our needs.
>>> 
>>> The third option don't fix issues we have with H2.
>>> The fourth option I know is using spark-catalyst.
>>> 
>>> What is wrong with writing engine from scratch?
>>> 
>>> I ask you to start with engine requirements.
>>> Can we, please, discuss it?
>>> 
>>>> If you have an alternative - you're welcome, I'll gratefully listen to you.
>>> 
>>> We have alternative for now - H2 based engine.
>>> 
>>>> The main question isn't "WHAT" but "HOW" - that's the discussion topic 
>>>> from my point of view.
>>> 
>>> When we make a decision about engine we can discuss roadmap for replacement.
>>> One more time - replacement of SQL engine to some more customizable make 
>>> sense for me.
>>> But, this kind of decisions need carefull discussion.
>>> 
>>> В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
>>>> Nikolay,
>>>> 
>>>> The main issue - there is no *selection*.
>>>> 
>>>> There is a field of knowledge - relational algebra, which describes how to 
>>>> transform relational expressions saving their semantics, and a couple of 
>>>> implementations (Calcite is only one written in Java).
>>>> 
>>>> There are only two alternatives:
>>>> 
>>>> 1) Implementing white papers from scratch
>>>> 2) Adopting Calcite to our needs.
>>>> 
>>>> The second way was chosen by several other projects, there is experience, 
>>>> there is a list of known issues (like using indexes) so, almost everything 
>>>> is already done for us.
>>>> 
>>>> Implementing a planner is a big deal, I think anybody understands it 
>>>> there. That's why our proposal to reuse others experience is obvious.
>>>> 
>>>> If you have an alternative - you're welcome, I'll gratefully listen to you.
>>>> 
>>>> The main question isn't "WHAT" but "HOW" - that's the discussion topic 
>>>> from my point of view.
>>>> 
>>>> Regards,
>>>> Igor
>>>> 
>>>>> 27 сент. 2019 г., в 16:37, Nikolay Izhikov <[email protected]> 
>>>>> написал(а):
>>>>> 
>>>>> Roman.
>>>>> 
>>>>>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>>>>>> for you as it obvious for SQL team. So, please arrange your questions in 
>>>>>> a more constructive way.
>>>>> 
>>>>> What is SQL team?
>>>>> I only know Ignite community :)
>>>>> 
>>>>> Please, share you knowledge in IEP.
>>>>> I want to join to the process of engine *selection*.
>>>>> It should start with the requirements to such engine.
>>>>> Can you write it in IEP, please?
>>>>> 
>>>>> My point is very simple:
>>>>> 
>>>>> 1. We made the wrong decision with H2
>>>>> 2. We should make a well-thought decision about the new engine.
>>>>> 
>>>>>> How many tickets would satisfy you?
>>>>> 
>>>>> You write about "issueS" with the H2.
>>>>> All I see is one open ticket.
>>>>> IEP doesn't provide enough information.
>>>>> So it's not about the number of tickets, it's about
>>>>> 
>>>>>> These two points (single map-reduce execution and inflexible optimizer) 
>>>>>> are the main problems with the current engine.
>>>>> 
>>>>> We may come to the point when Calcite(or any other engine) brings us 
>>>>> third and other "main problems".
>>>>> This is how it happens with H2.
>>>>> 
>>>>> Let's start from what we want to get with the engine and move forward 
>>>>> from this base.
>>>>> What do you think?
>>>>> 
>>>>> 
>>>>> 
>>>>> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
>>>>>> Maxim, Nikolay,
>>>>>> 
>>>>>> I've listed two issues which show the ideological flaws of the current 
>>>>>> engine.
>>>>>> 
>>>>>> 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
>>>>>> executing queries which can not be fit in the hardcoded one pass 
>>>>>> map-reduce paradigm.
>>>>>> 
>>>>>> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
>>>>>> major problem with the current engine: H2 query optimizer is very 
>>>>>> primitive and can not perform many useful optimizations.
>>>>>> 
>>>>>> These two points (single map-reduce execution and inflexible optimizer) 
>>>>>> are the main problems with the current engine. It means that our engine 
>>>>>> is currently  suitable for execution only a very limited subset of the 
>>>>>> typical SQL queries. For example it can not even run most of the TPC-H 
>>>>>> benchmark queries because they don't fit to the simple map-reduce 
>>>>>> paradigm.
>>>>>> 
>>>>>>> All I see is links to two tickets:
>>>>>> 
>>>>>> How many tickets would satisfy you? I named two. And it looks like it is 
>>>>>> not enough from your point of view. Ok, so how many is enough? The set 
>>>>>> of problems caused by listed above tickets is infinite, therefore I can 
>>>>>> not create a ticket for each of them.
>>>>>>> Tech details also should be added.
>>>>>> 
>>>>>> Tech details are in the tickets.
>>>>>> 
>>>>>>> We can't discuss such a huge change as an execution engine replacement 
>>>>>>> with descrition like:
>>>>>>> "No data co-location control, i.e. arbitrary data can be returned 
>>>>>>> silently" or
>>>>>>> "Low control on how query executes internally, as a result we have 
>>>>>>> limited possibility to implement improvements/fixes."
>>>>>> 
>>>>>> Why not? Don't you understand these problems? Or you don't think this is 
>>>>>> a problem?
>>>>>> 
>>>>>>> Let's make these descriptions more specific.
>>>>>> 
>>>>>> What do you mean by "more specific"? What is the criteria of the 
>>>>>> specific description?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>>>>>> for you as it obvious for SQL team. So, please arrange your questions in 
>>>>>> a more constructive way.
>>>>>> 
>>>>>> Thank you!
>>>> 
>>>> 
>> 
>>

Re: New SQL execution engine

Reply via email to