+1 to Alexey's concerns. Local SQL query mode is error prone, as a query executes over non-predicted set of partitions. Using local mode with deep SQL execution model understanding will lead to inconsistent result.
Just imagine if we add a note to documentation that "in case of local SQL user results can depends on topology (partition distribution)". This definitely not looks like the thing we'd like to provide to end-user. On Thu, Nov 7, 2019 at 11:31 AM Alexey Goncharuk <alexey.goncha...@gmail.com> wrote: > Denis, Stephen, > > Running a local query in a broadcast closure won't work on changing > topology. We specifically added an affinityCall method to the compute API > in order to pin a partition to prevent its moving and eviction throughout > the task execution. Therefore, the query inside an affinityCall is always > executed against some partitions (otherwise the query may give incorrect > results when topology is changed). > > I support Igor's question and think that the 'local' flag for the query > should be deprecated and eventually removed. A 'local' query can always be > expressed as a query agains a set of partitions. If those partitions are > located on the same node - good, we get fast and correct results. If not - > we may either raise an exception and ask user to remap the query, or > fallback to a distributed query execution. > > Given that the Calcite prototype is in its early stages, it's likely its > first version will be available in 3.x, and it's a good chance to get rid > of wrong API pieces. > > --AG > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington < > stephen.darling...@gridgain.com>: > > > A common use case is where you want to work on many rows of data across > > the grid. You’d broadcast a closure, running the same code on every node > > with just the local data. SQL doesn’t work in isolation — it’s often used > > as a filter for future computations. > > > > Regards, > > Stephen > > > > > On 1 Nov 2019, at 17:53, Ivan Pavlukhin <vololo...@gmail.com> wrote: > > > > > > Denis, > > > > > > I am mostly concerned about gathering use cases. It would be great to > > > critically assess such cases to identify why it cannot be solved by > > > using distributed SQL. Also it sounds similar to some kind of "hints", > > > but very limited and with all hints drawbacks (impossibility to use > > > full strength of CBO). We can provide better "hints" support with new > > > engine as well. > > > > > > пт, 1 нояб. 2019 г. в 20:14, Denis Magda <dma...@apache.org>: > > >> > > >> Ivan, > > >> > > >> I was involved in a couple of such use cases personally, so, that's > not > > my > > >> imagination ;) Even more, as far as I remember, the primary reason why > > we > > >> improved our affinityRuns ensuring no partition is purged from a node > > until > > >> a task is completed is because many users were running local SQL from > > >> compute tasks and needed a guarantee that SQL will always return a > > correct > > >> result set. > > >> > > >> - > > >> Denis > > >> > > >> > > >> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <vololo...@gmail.com> > > wrote: > > >> > > >>> Denis, > > >>> > > >>> Would be nice to see real use-cases of affinity call + local SQL > > >>> combination. Generally, new engine will be able to infer collocation > > >>> resulting in the same collocated execution automatically. > > >>> > > >>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <dma...@apache.org>: > > >>>> > > >>>> Hi Igor, > > >>>> > > >>>> Local queries feature is broadly used together with affinity-based > > >>> compute > > >>>> tasks: > > >>>> > > >>> > > > https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods > > >>>> > > >>>> The use case is as follows. The user knows that all required data > > needed > > >>>> for computation is collocated, and SQL is used as an advanced API > for > > >>> data > > >>>> retrieval from the computation code. The affinity task ensures that > > >>>> partitions won't be discarded from the node(s) if the topology > changes > > >>>> during the task execution and, thus, it's safe to run SQL locally > > >>> skipping > > >>>> distributed phases. > > >>>> > > >>>> The combination of affinity compute tasks with local SQL is a real > and > > >>>> valuable use case, and this is what we need to support with Calcite. > > Do > > >>> you > > >>>> see any challenges? > > >>>> > > >>>> - > > >>>> Denis > > >>>> > > >>>> > > >>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov > > <kondako...@mail.ru.invalid > > >>>> > > >>>> wrote: > > >>>> > > >>>>> Hi Igor! > > >>>>> > > >>>>> IMO we need to maintain the backward compatibility between old and > > new > > >>>>> query engines as much as possible. And therefore we shouldn't > change > > >>> the > > >>>>> behavior of local queries. > > >>>>> > > >>>>> So, for local queries Calcite's planner shouldn't consider the > > >>>>> distribution trait at all. > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> Kind Regards > > >>>>> Roman Kondakov > > >>>>> > > >>>>> On 01.11.2019 17:07, Seliverstov Igor wrote: > > >>>>>> Hi Igniters, > > >>>>>> > > >>>>>> Working on new generation of Ignite SQL I faced a question: «Do we > > >>> need > > >>>>> local queries at all and, if so, what semantic they should have?». > > >>>>>> > > >>>>>> Current planing flow consists of next steps: > > >>>>>> > > >>>>>> 1) Parsing SQL to AST > > >>>>>> 2) Validating AST (against Schema) > > >>>>>> 3) Optimizing (Building execution graph) > > >>>>>> 4) Splitting (into query fragments which executes on target nodes) > > >>>>>> 5) Mapping (query fragments to nodes/partitions) > > >>>>>> > > >>>>>> At last step we check that all Fragment sources (a table or > result) > > >>> have > > >>>>> the same distribution (in other words all sources have to be > > >>> co-located) > > >>>>>> > > >>>>>> Planner and Splitter guarantee that all caches in a Fragment are > > >>>>> co-located, an Exchange is produced otherwise. But if we force > local > > >>>>> execution we cannot produce Exchanges, that means we may face two > > >>>>> non-co-located caches inside a single query fragment (result of > local > > >>> query > > >>>>> planning is a single query fragment). So, we cannot pass the check. > > >>>>>> > > >>>>>> Should we throw an exception or omit the check for local query > > >>> planning > > >>>>> or prohibit local queries at all? > > >>>>>> > > >>>>>> Your thoughts? > > >>>>>> > > >>>>>> Regards, > > >>>>>> Igor > > >>>>> > > >>> > > >>> > > >>> > > >>> -- > > >>> Best regards, > > >>> Ivan Pavlukhin > > >>> > > > > > > > > > > > > -- > > > Best regards, > > > Ivan Pavlukhin > > > > > > > -- Best regards, Andrey V. Mashenkov