Re: Calcite based SQL query engine. Local queries

Stephen Darlington Mon, 04 Nov 2019 03:02:56 -0800

A common use case is where you want to work on many rows of data across the 
grid. You’d broadcast a closure, running the same code on every node with just 
the local data. SQL doesn’t work in isolation — it’s often used as a filter for 
future computations.


Regards,
Stephen

> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <vololo...@gmail.com> wrote:
> 
> Denis,
> 
> I am mostly concerned about gathering use cases. It would be great to
> critically assess such cases to identify why it cannot be solved by
> using distributed SQL. Also it sounds similar to some kind of "hints",
> but very limited and with all hints drawbacks (impossibility to use
> full strength of CBO). We can provide better "hints" support with new
> engine as well.
> 
> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <dma...@apache.org>:
>> 
>> Ivan,
>> 
>> I was involved in a couple of such use cases personally, so, that's not my
>> imagination ;) Even more, as far as I remember, the primary reason why we
>> improved our affinityRuns ensuring no partition is purged from a node until
>> a task is completed is because many users were running local SQL from
>> compute tasks and needed a guarantee that SQL will always return a correct
>> result set.
>> 
>> -
>> Denis
>> 
>> 
>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <vololo...@gmail.com> wrote:
>> 
>>> Denis,
>>> 
>>> Would be nice to see real use-cases of affinity call + local SQL
>>> combination. Generally, new engine will be able to infer collocation
>>> resulting in the same collocated execution automatically.
>>> 
>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <dma...@apache.org>:
>>>> 
>>>> Hi Igor,
>>>> 
>>>> Local queries feature is broadly used together with affinity-based
>>> compute
>>>> tasks:
>>>> 
>>> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
>>>> 
>>>> The use case is as follows. The user knows that all required data needed
>>>> for computation is collocated, and SQL is used as an advanced API for
>>> data
>>>> retrieval from the computation code. The affinity task ensures that
>>>> partitions won't be discarded from the node(s) if the topology changes
>>>> during the task execution and, thus, it's safe to run SQL locally
>>> skipping
>>>> distributed phases.
>>>> 
>>>> The combination of affinity compute tasks with local SQL is a real and
>>>> valuable use case, and this is what we need to support with Calcite. Do
>>> you
>>>> see any challenges?
>>>> 
>>>> -
>>>> Denis
>>>> 
>>>> 
>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov <kondako...@mail.ru.invalid
>>>> 
>>>> wrote:
>>>> 
>>>>> Hi Igor!
>>>>> 
>>>>> IMO we need to maintain the backward compatibility between old and new
>>>>> query engines as much as possible. And therefore we shouldn't change
>>> the
>>>>> behavior of local queries.
>>>>> 
>>>>> So, for local queries Calcite's planner shouldn't consider the
>>>>> distribution trait at all.
>>>>> 
>>>>> 
>>>>> --
>>>>> Kind Regards
>>>>> Roman Kondakov
>>>>> 
>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
>>>>>> Hi Igniters,
>>>>>> 
>>>>>> Working on new generation of Ignite SQL I faced a question: «Do we
>>> need
>>>>> local queries at all and, if so, what semantic they should have?».
>>>>>> 
>>>>>> Current planing flow consists of next steps:
>>>>>> 
>>>>>> 1) Parsing SQL to AST
>>>>>> 2) Validating AST (against Schema)
>>>>>> 3) Optimizing (Building execution graph)
>>>>>> 4) Splitting (into query fragments which executes on target nodes)
>>>>>> 5) Mapping (query fragments to nodes/partitions)
>>>>>> 
>>>>>> At last step we check that all Fragment sources (a table or result)
>>> have
>>>>> the same distribution (in other words all sources have to be
>>> co-located)
>>>>>> 
>>>>>> Planner and Splitter guarantee that all caches in a Fragment are
>>>>> co-located, an Exchange is produced otherwise. But if we force local
>>>>> execution we cannot produce Exchanges, that means we may face two
>>>>> non-co-located caches inside a single query fragment (result of local
>>> query
>>>>> planning is a single query fragment). So, we cannot pass the check.
>>>>>> 
>>>>>> Should we throw an exception or omit the check for local query
>>> planning
>>>>> or prohibit local queries at all?
>>>>>> 
>>>>>> Your thoughts?
>>>>>> 
>>>>>> Regards,
>>>>>> Igor
>>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> Ivan Pavlukhin
>>> 
> 
> 
> 
> -- 
> Best regards,
> Ivan Pavlukhin

Re: Calcite based SQL query engine. Local queries

Reply via email to