Re: Allow distributed SQL query execution over explicit set of partitions

Alexei Scherbakov Thu, 19 Jan 2017 06:35:56 -0800

Vladimir, do you have any idea how specific partition settings will work
for queries over replicated caches, when every node will have full data
set. ?


The query will only be parallelized on nodes defined by passed partitions,
or something else ?



2017-01-19 17:27 GMT+03:00 Alexei Scherbakov <[email protected]>:

> 1. OK.
>
> 2. Agreed. In future we might split query execution between nodes, but for
> now query is routed to random node in grid,
>
> 3. OK, let's mark getter/setter as deprecated.
>
> 4. Query must be executed locally only for defined partitions. Currently
> this setting is ignored for local queries.
>
> 5. I have the same understanding. Distributed joins will ignore the
> setting.
> This is not implemented yet..
>
>
> 2017-01-19 15:39 GMT+03:00 Sergi Vladykin <[email protected]>:
>
>> Agree, lets remove everything related to partition ranges. Looks like
>> unnecessary complication.
>>
>> Sergi
>>
>> 2017-01-19 10:01 GMT+03:00 Vladimir Ozerov <[email protected]>:
>>
>> > Several side notes about API.
>> >
>> > 1) I would avoid ranges even in this form.for the sake of simplicity.
>> > Ignite do not have any notion of "partition range" in affinity API, so
>> I do
>> > not understand how users are going to work on ranges unless they have
>> some
>> > very special custom affinity function, which is rather unlikely case.
>> >
>> > 2) The fact that this property is ignored in REPLICATED cache is
>> confusing.
>> > REPLICATED cache still divides partitions into primaries and backups.
>> If I
>> > have very large data set and want to execute some query, I would
>> definitely
>> > expect that Ignite will take advantage of distributed computing and
>> spread
>> > the load between nodes. I understand that currently SQL queries do not
>> work
>> > this way, but this is clear disadvantage for certain scenarios, which we
>> > may improve in future. I would remove this paragraph from docs.
>> >
>> > 3) We already have ScanQuery.partition getter/setter. We need to make
>> sure
>> > that they are "merged" somehow. For instance, we may deprecate two
>> methods
>> > in ScanQuery class, and advise users to use Query.partitions, with
>> > clarification - only single partition is supported for ScanQuery at the
>> > moment.
>> >
>> > 4) What should happen if "partitions" are defined and "local" flag is
>> set?
>> >
>> > As per distributed joins - how are we going to execute them when
>> partitions
>> > are set explicitly? As far as I understand, partitions should apply
>> only to
>> > map step and only for the cache query was created from, This way
>> > distributed join execution should effectively ignore partitions?
>> >
>> > Vladimir.
>> >
>> >
>> > On Thu, Jan 19, 2017 at 1:04 AM, Alexei Scherbakov <
>> > [email protected]> wrote:
>> >
>> > > I mean distributed joins.
>> > >
>> > > 2017-01-19 0:10 GMT+03:00 Alexei Scherbakov <
>> > [email protected]>
>> > > :
>> > >
>> > > > Guys,
>> > > >
>> > > > I've finished adding API changes and implemented proper nodes
>> routing.
>> > > >
>> > > > Currently it doesn't work with distributed queries.But I think this
>> > > > feature should be compatible with it.
>> > > >
>> > > > Could anyone take a look at current branch state while I'm looking
>> > deeper
>> > > > into dsitributed queries code?
>> > > >
>> > > > Issue: https://issues.apache.org/jira/browse/IGNITE-4523
>> > > > PR: https://github.com/apache/ignite/pull/1418
>> > > >
>> > > >
>> > > >
>> > > > 2017-01-13 15:55 GMT+03:00 Alexei Scherbakov <
>> > > [email protected]
>> > > > >:
>> > > >
>> > > >> OK, let's do it this way.
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >> 2017-01-13 13:27 GMT+03:00 Sergi Vladykin <
>> [email protected]>:
>> > > >>
>> > > >>> Internally we still use int[] when we send partitions (see
>> > > >>> GridH2QueryRequest.parts). It looks like we only do more work with
>> > > >>> PartitionSet.
>> > > >>>
>> > > >>> I like the idea of bitset for partitions, but
>> > > >>>
>> > > >>> 1. We have to change internals first to use it, otherwise the
>> > > >>> optimization
>> > > >>> makes no sense.
>> > > >>> 2. We will need to have a method SqlQuery.setPartitions(int...
>> parts)
>> > > for
>> > > >>> usability reasons anyways.
>> > > >>>
>> > > >>> Thus I suggest for now to go the straightforward way with int[]
>> and
>> > > >>> create
>> > > >>> a separate ticket describing the optimization with bitset.
>> > > >>>
>> > > >>> Sergi
>> > > >>>
>> > > >>> 2017-01-13 13:06 GMT+03:00 Alexei Scherbakov <
>> > > >>> [email protected]>:
>> > > >>>
>> > > >>> > PartitionSet hides internal implementation of int array.
>> > > >>> >
>> > > >>> > This allows as to efficiently represent contiguous range of
>> > > partitions
>> > > >>> and
>> > > >>> > defines clear API for ordered iteration over partitions and
>> > > containment
>> > > >>> > check.
>> > > >>> >
>> > > >>> > Even better to go with compressed bitmap, as I mentioned in
>> ticket
>> > > >>> comment.
>> > > >>> > This will allow us to minimize heap footprint for this object.
>> > > >>> >
>> > > >>> > Moreover, it will be useful to create reusable compressed bitmap
>> > > >>> > implementation in Ignite and use it in other cases, on example,
>> for
>> > > >>> > replacing H2's IntArray and Set<Integer>.
>> > > >>> >
>> > > >>> > Should I create a ticket for this ?
>> > > >>> >
>> > > >>> > .
>> > > >>> >
>> > > >>> > 2017-01-13 1:01 GMT+03:00 Dmitriy Setrakyan <
>> [email protected]
>> > >:
>> > > >>> >
>> > > >>> > > On Thu, Jan 12, 2017 at 6:12 AM, Sergi Vladykin <
>> > > >>> > [email protected]>
>> > > >>> > > wrote:
>> > > >>> > >
>> > > >>> > > > I looked at the code. The PartitionSet concept looks
>> > > >>> overengineered to
>> > > >>> > > me,
>> > > >>> > > > why wouldn't we just go with int[]?
>> > > >>> > > >
>> > > >>> > >
>> > > >>> > > Agree.
>> > > >>> > >
>> > > >>> > >
>> > > >>> > > >
>> > > >>> > > > Sergi
>> > > >>> > > >
>> > > >>> > > > 2017-01-12 15:18 GMT+03:00 Alexei Scherbakov <
>> > > >>> > > [email protected]
>> > > >>> > > > >:
>> > > >>> > > >
>> > > >>> > > > > Done.
>> > > >>> > > > >
>> > > >>> > > > > 2017-01-11 20:39 GMT+03:00 Dmitriy Setrakyan <
>> > > >>> [email protected]
>> > > >>> > >:
>> > > >>> > > > >
>> > > >>> > > > > > Alexey,
>> > > >>> > > > > >
>> > > >>> > > > > > I am not sure I am seeing the API changes documented in
>> the
>> > > >>> ticket.
>> > > >>> > > Can
>> > > >>> > > > > you
>> > > >>> > > > > > please either document them or add GIT links for the new
>> > > >>> classes?
>> > > >>> > > > > >
>> > > >>> > > > > > D.
>> > > >>> > > > > >
>> > > >>> > > > > > On Wed, Jan 11, 2017 at 9:29 AM, Alexei Scherbakov <
>> > > >>> > > > > > [email protected]> wrote:
>> > > >>> > > > > >
>> > > >>> > > > > > > Guys,
>> > > >>> > > > > > >
>> > > >>> > > > > > > I've just submitted a PR for
>> > > >>> > > > > > > https://issues.apache.org/jira/browse/IGNITE-4523.
>> > > >>> > > > > > >
>> > > >>> > > > > > > Please review API changes while waiting for TC
>> results.
>> > > >>> > > > > > >
>> > > >>> > > > > > > --
>> > > >>> > > > > > >
>> > > >>> > > > > > > Best regards,
>> > > >>> > > > > > > Alexei Scherbakov
>> > > >>> > > > > > >
>> > > >>> > > > > >
>> > > >>> > > > >
>> > > >>> > > > >
>> > > >>> > > > >
>> > > >>> > > > > --
>> > > >>> > > > >
>> > > >>> > > > > Best regards,
>> > > >>> > > > > Alexei Scherbakov
>> > > >>> > > > >
>> > > >>> > > >
>> > > >>> > >
>> > > >>> >
>> > > >>> >
>> > > >>> >
>> > > >>> > --
>> > > >>> >
>> > > >>> > Best regards,
>> > > >>> > Alexei Scherbakov
>> > > >>> >
>> > > >>>
>> > > >>
>> > > >>
>> > > >>
>> > > >> --
>> > > >>
>> > > >> Best regards,
>> > > >> Alexei Scherbakov
>> > > >>
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Best regards,
>> > > > Alexei Scherbakov
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Best regards,
>> > > Alexei Scherbakov
>> > >
>> >
>>
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>



-- 

Best regards,
Alexei Scherbakov

Re: Allow distributed SQL query execution over explicit set of partitions

Reply via email to