Stephen, In my understanding we need to do a better job to realize use-cases of Compute + LocalSQL ourselves.
Ideally smart optimizer should do the best job of query deployment. чт, 7 нояб. 2019 г. в 13:04, Stephen Darlington <stephen.darling...@gridgain.com>: > > I made a (bad) assumption that this would also affect queries against > partitions. If “setLocal()” goes away but “setPartitions()” remains I’m happy. > > What I would say is that the “broadcast / local” method is one I see fairly > often. Do we need to do a better job educating people of the “correct” way? > > Regards, > Stephen > > > On 7 Nov 2019, at 08:30, Alexey Goncharuk <alexey.goncha...@gmail.com> > > wrote: > > > > Denis, Stephen, > > > > Running a local query in a broadcast closure won't work on changing > > topology. We specifically added an affinityCall method to the compute API > > in order to pin a partition to prevent its moving and eviction throughout > > the task execution. Therefore, the query inside an affinityCall is always > > executed against some partitions (otherwise the query may give incorrect > > results when topology is changed). > > > > I support Igor's question and think that the 'local' flag for the query > > should be deprecated and eventually removed. A 'local' query can always be > > expressed as a query agains a set of partitions. If those partitions are > > located on the same node - good, we get fast and correct results. If not - > > we may either raise an exception and ask user to remap the query, or > > fallback to a distributed query execution. > > > > Given that the Calcite prototype is in its early stages, it's likely its > > first version will be available in 3.x, and it's a good chance to get rid > > of wrong API pieces. > > > > --AG > > > > пн, 4 нояб. 2019 г. в 14:02, Stephen Darlington < > > stephen.darling...@gridgain.com>: > > > >> A common use case is where you want to work on many rows of data across > >> the grid. You’d broadcast a closure, running the same code on every node > >> with just the local data. SQL doesn’t work in isolation — it’s often used > >> as a filter for future computations. > >> > >> Regards, > >> Stephen > >> > >>> On 1 Nov 2019, at 17:53, Ivan Pavlukhin <vololo...@gmail.com> wrote: > >>> > >>> Denis, > >>> > >>> I am mostly concerned about gathering use cases. It would be great to > >>> critically assess such cases to identify why it cannot be solved by > >>> using distributed SQL. Also it sounds similar to some kind of "hints", > >>> but very limited and with all hints drawbacks (impossibility to use > >>> full strength of CBO). We can provide better "hints" support with new > >>> engine as well. > >>> > >>> пт, 1 нояб. 2019 г. в 20:14, Denis Magda <dma...@apache.org>: > >>>> > >>>> Ivan, > >>>> > >>>> I was involved in a couple of such use cases personally, so, that's not > >> my > >>>> imagination ;) Even more, as far as I remember, the primary reason why > >> we > >>>> improved our affinityRuns ensuring no partition is purged from a node > >> until > >>>> a task is completed is because many users were running local SQL from > >>>> compute tasks and needed a guarantee that SQL will always return a > >> correct > >>>> result set. > >>>> > >>>> - > >>>> Denis > >>>> > >>>> > >>>> On Fri, Nov 1, 2019 at 10:01 AM Ivan Pavlukhin <vololo...@gmail.com> > >> wrote: > >>>> > >>>>> Denis, > >>>>> > >>>>> Would be nice to see real use-cases of affinity call + local SQL > >>>>> combination. Generally, new engine will be able to infer collocation > >>>>> resulting in the same collocated execution automatically. > >>>>> > >>>>> пт, 1 нояб. 2019 г. в 19:11, Denis Magda <dma...@apache.org>: > >>>>>> > >>>>>> Hi Igor, > >>>>>> > >>>>>> Local queries feature is broadly used together with affinity-based > >>>>> compute > >>>>>> tasks: > >>>>>> > >>>>> > >> https://apacheignite.readme.io/docs/collocate-compute-and-data#section-affinity-call-and-run-methods > >>>>>> > >>>>>> The use case is as follows. The user knows that all required data > >> needed > >>>>>> for computation is collocated, and SQL is used as an advanced API for > >>>>> data > >>>>>> retrieval from the computation code. The affinity task ensures that > >>>>>> partitions won't be discarded from the node(s) if the topology changes > >>>>>> during the task execution and, thus, it's safe to run SQL locally > >>>>> skipping > >>>>>> distributed phases. > >>>>>> > >>>>>> The combination of affinity compute tasks with local SQL is a real and > >>>>>> valuable use case, and this is what we need to support with Calcite. > >> Do > >>>>> you > >>>>>> see any challenges? > >>>>>> > >>>>>> - > >>>>>> Denis > >>>>>> > >>>>>> > >>>>>> On Fri, Nov 1, 2019 at 8:46 AM Roman Kondakov > >> <kondako...@mail.ru.invalid > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi Igor! > >>>>>>> > >>>>>>> IMO we need to maintain the backward compatibility between old and > >> new > >>>>>>> query engines as much as possible. And therefore we shouldn't change > >>>>> the > >>>>>>> behavior of local queries. > >>>>>>> > >>>>>>> So, for local queries Calcite's planner shouldn't consider the > >>>>>>> distribution trait at all. > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Kind Regards > >>>>>>> Roman Kondakov > >>>>>>> > >>>>>>> On 01.11.2019 17:07, Seliverstov Igor wrote: > >>>>>>>> Hi Igniters, > >>>>>>>> > >>>>>>>> Working on new generation of Ignite SQL I faced a question: «Do we > >>>>> need > >>>>>>> local queries at all and, if so, what semantic they should have?». > >>>>>>>> > >>>>>>>> Current planing flow consists of next steps: > >>>>>>>> > >>>>>>>> 1) Parsing SQL to AST > >>>>>>>> 2) Validating AST (against Schema) > >>>>>>>> 3) Optimizing (Building execution graph) > >>>>>>>> 4) Splitting (into query fragments which executes on target nodes) > >>>>>>>> 5) Mapping (query fragments to nodes/partitions) > >>>>>>>> > >>>>>>>> At last step we check that all Fragment sources (a table or result) > >>>>> have > >>>>>>> the same distribution (in other words all sources have to be > >>>>> co-located) > >>>>>>>> > >>>>>>>> Planner and Splitter guarantee that all caches in a Fragment are > >>>>>>> co-located, an Exchange is produced otherwise. But if we force local > >>>>>>> execution we cannot produce Exchanges, that means we may face two > >>>>>>> non-co-located caches inside a single query fragment (result of local > >>>>> query > >>>>>>> planning is a single query fragment). So, we cannot pass the check. > >>>>>>>> > >>>>>>>> Should we throw an exception or omit the check for local query > >>>>> planning > >>>>>>> or prohibit local queries at all? > >>>>>>>> > >>>>>>>> Your thoughts? > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Igor > >>>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Best regards, > >>>>> Ivan Pavlukhin > >>>>> > >>> > >>> > >>> > >>> -- > >>> Best regards, > >>> Ivan Pavlukhin > >> > >> > >> > > -- Best regards, Ivan Pavlukhin