Re: Calcite based SQL query engine. Local queries

Andrey Mashenkov Thu, 07 Nov 2019 01:21:33 -0800

+1 to Alexey's concerns.

Local SQL query mode is error prone, as a query executes over non-predicted
set of partitions.
Using local mode with deep SQL execution model understanding will lead to
inconsistent result.


Just imagine if we add a note to documentation that "in case of local SQL
user results can depends on topology (partition distribution)".
This definitely not looks like the thing we'd like to provide to end-user.

On Thu, Nov 7, 2019 at 11:31 AM Alexey Goncharuk <alexey.goncha...@gmail.com>
wrote:

> Denis, Stephen,
>
> Running a local query in a broadcast closure won't work on changing
> topology. We specifically added an affinityCall method to the compute API
> in order to pin a partition to prevent its moving and eviction throughout
> the task execution. Therefore, the query inside an affinityCall is always
> executed against some partitions (otherwise the query may give incorrect
> results when topology is changed).
>
> I support Igor's question and think that the 'local' flag for the query
> should be deprecated and eventually removed. A 'local' query can always be
> expressed as a query agains a set of partitions. If those partitions are
> located on the same node - good, we get fast and correct results. If not -
> we may either raise an exception and ask user to remap the query, or
> fallback to a distributed query execution.
>
> Given that the Calcite prototype is in its early stages, it's likely its
> first version will be available in 3.x, and it's a good chance to get rid
> of wrong API pieces.
>
> --AG
>
> пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington <
> stephen.darling...@gridgain.com>:
>
> > A common use case is where you want to work on many rows of data across
> > the grid. You’d broadcast a closure, running the same code on every node
> > with just the local data. SQL doesn’t work in isolation — it’s often used
> > as a filter for future computations.
> >
> > Regards,
> > Stephen
> >
> > > On 1 Nov 2019, at 17:53, Ivan Pavlukhin <vololo...@gmail.com> wrote:
> > >
> > > Denis,
> > >
> > > I am mostly concerned about gathering use cases. It would be great to
> > > critically assess such cases to identify why it cannot be solved by
> > > using distributed SQL. Also it sounds similar to some kind of "hints",
> > > but very limited and with all hints drawbacks (impossibility to use
> > > full strength of CBO). We can provide better "hints" support with new
> > > engine as well.
> > >
> > > пт, 1 нояб. 2019 г. в 20:14, Denis Magda <dma...@apache.org>:
> > >>
> > >> Ivan,
> > >>
> > >> I was involved in a couple of such use cases personally, so, that's
> not
> > my
> > >> imagination ;) Even more, as far as I remember, the primary reason why
> > we
> > >> improved our affinityRuns ensuring no partition is purged from a node
> > until
> > >> a task is completed is because many users were running local SQL from
> > >> compute tasks and needed a guarantee that SQL will always return a
> > correct
> > >> result set.
> > >>
> > >> -
> > >> Denis
> > >>
> > >>
> > >> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <vololo...@gmail.com>
> > wrote:
> > >>
> > >>> Denis,
> > >>>
> > >>> Would be nice to see real use-cases of affinity call + local SQL
> > >>> combination. Generally, new engine will be able to infer collocation
> > >>> resulting in the same collocated execution automatically.
> > >>>
> > >>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <dma...@apache.org>:
> > >>>>
> > >>>> Hi Igor,
> > >>>>
> > >>>> Local queries feature is broadly used together with affinity-based
> > >>> compute
> > >>>> tasks:
> > >>>>
> > >>>
> >
> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods
> > >>>>
> > >>>> The use case is as follows. The user knows that all required data
> > needed
> > >>>> for computation is collocated, and SQL is used as an advanced API
> for
> > >>> data
> > >>>> retrieval from the computation code. The affinity task ensures that
> > >>>> partitions won't be discarded from the node(s) if the topology
> changes
> > >>>> during the task execution and, thus, it's safe to run SQL locally
> > >>> skipping
> > >>>> distributed phases.
> > >>>>
> > >>>> The combination of affinity compute tasks with local SQL is a real
> and
> > >>>> valuable use case, and this is what we need to support with Calcite.
> > Do
> > >>> you
> > >>>> see any challenges?
> > >>>>
> > >>>> -
> > >>>> Denis
> > >>>>
> > >>>>
> > >>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov
> > <kondako...@mail.ru.invalid
> > >>>>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi Igor!
> > >>>>>
> > >>>>> IMO we need to maintain the backward compatibility between old and
> > new
> > >>>>> query engines as much as possible. And therefore we shouldn't
> change
> > >>> the
> > >>>>> behavior of local queries.
> > >>>>>
> > >>>>> So, for local queries Calcite's planner shouldn't consider the
> > >>>>> distribution trait at all.
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Kind Regards
> > >>>>> Roman Kondakov
> > >>>>>
> > >>>>> On 01.11.2019 17:07, Seliverstov Igor wrote:
> > >>>>>> Hi Igniters,
> > >>>>>>
> > >>>>>> Working on new generation of Ignite SQL I faced a question: «Do we
> > >>> need
> > >>>>> local queries at all and, if so, what semantic they should have?».
> > >>>>>>
> > >>>>>> Current planing flow consists of next steps:
> > >>>>>>
> > >>>>>> 1) Parsing SQL to AST
> > >>>>>> 2) Validating AST (against Schema)
> > >>>>>> 3) Optimizing (Building execution graph)
> > >>>>>> 4) Splitting (into query fragments which executes on target nodes)
> > >>>>>> 5) Mapping (query fragments to nodes/partitions)
> > >>>>>>
> > >>>>>> At last step we check that all Fragment sources (a table or
> result)
> > >>> have
> > >>>>> the same distribution (in other words all sources have to be
> > >>> co-located)
> > >>>>>>
> > >>>>>> Planner and Splitter guarantee that all caches in a Fragment are
> > >>>>> co-located, an Exchange is produced otherwise. But if we force
> local
> > >>>>> execution we cannot produce Exchanges, that means we may face two
> > >>>>> non-co-located caches inside a single query fragment (result of
> local
> > >>> query
> > >>>>> planning is a single query fragment). So, we cannot pass the check.
> > >>>>>>
> > >>>>>> Should we throw an exception or omit the check for local query
> > >>> planning
> > >>>>> or prohibit local queries at all?
> > >>>>>>
> > >>>>>> Your thoughts?
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> Igor
> > >>>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best regards,
> > >>> Ivan Pavlukhin
> > >>>
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> >
> >
> >
>


-- 
Best regards,
Andrey V. Mashenkov

Re: Calcite based SQL query engine. Local queries

Reply via email to