Re: [DISCUSSION] Reserve partitions for CacheQueries

Ivan Pavlukhin Mon, 13 Dec 2021 00:00:20 -0800

> Then, if there are no objections, the short-term plan is:

Sounds ok for me.


> AFAIR Scan and SQL queries implementations are totally different. Could you 
> tell me how Index queries fit there?

I suppose my questions was misleading. Actually I would like to know
how code is organized today. AFAIR SQL and scan queries has their own
codebases (messages, merging results from different nodes and etc).
Does IndexQuery has separate codebase? Does it share some code with
ScanQuery on a "query" layer (higher than BTree layer)?

2021-12-10 13:25 GMT+03:00, Maksim Timonin <[email protected]>:
> Hi!
>
>> Existing users may already have some logic in place to handle
> inconsistencies
>
> Pavel, I'm not aware of such users but your comment makes sense. So, I' OK
> with adding an option for ScanQuery. Naming of the option is debatable. The
> name "reservePartitions" looks good, but we actually already reserve a
> partition when it is explicitly specified in ScanQuery. Then, it can be a
> bit misleading in the case of explicitly setting this param to `false` with
> a specified partition. But we can just mention it in javadocs, that the
> setting affects only full scan.
>
>> AFAIR Scan and SQL queries implementations are totally different. Could
> you tell me how Index queries fit there?
>
> IndexQuery scans indexes like SQL does with few optimizations of traversing
> BPlusTree, also there are some differences in query processing, also
> IndexQuery provides sorted results by default. But I don't expect
> inconsistency in results for SQL / IndexQuery for similar queries.
> Actually, I should add tests for that and fix failures if any.
>
>> In an ideal world it would be great to have only one API
>
> I think it's possible to teach SQL to switch to cache queries for some
> cases, or provide such an opportunity with hints or explicit functions. And
> these queries might be parts of an SQL query plan. Or we can go even
> deeper, maybe it could be like Apache Spark stages, when we can build our
> plan with different types of queries, and the same type provides to
> users the opportunity to run only specific stages: DataFrame.sql() or
> DataFrame.filter(inline=true/false).
>
>> Perhaps IndexQuery should also cover regular entry iteration cases
>
> It's possible too. IndexQuery provides an opportunity to scan the PK index,
> then it can start ScanQuery under the hood for some cases.
>
> But anyway, to make them run through the single API, we should provide the
> same guarantees.
>
> Then, if there are no objections, the short-term plan is:
> 1. Implement partition reservation for IndexQuery.
> 2. Add the option for ScanQuery.
> 3. Make tests that show that IndexQuery / SQL / ScanQuery is replaceable
> for some types of queries in terms of result consistency.
> 4. Then discuss again, how we can integrate them together.
>
>
> On Fri, Dec 10, 2021 at 11:41 AM Ivan Pavlukhin <[email protected]>
> wrote:
>
>> Actually I brought a point about using SQL queries instead of scan
>> queries because I worry about inconsistency between different
>> implementations. AFAIR Scan and SQL queries implementations are
>> totally different. Could you tell me how Index queries fit there?
>>
>> My general ideas are as follows:
>> 1. In an ideal world it would be great to have only one API (to rule
>> them all) and implementation to cover all cases (i.e. scan, index,
>> SQL). Also, I wonder how other vendors tackle this?
>> 2. New IndexQuery might not mimic ScanQuery behavior but instead have
>> a correct/expected one (including partition reservation). Perhaps
>> IndexQuery should also cover regular entry iteration (scan) cases and
>> become a new ground for generalized scan/index queries.
>>
>> 2021-12-09 17:08 GMT+03:00, Pavel Tupitsyn <[email protected]>:
>> > IndexQuery is experimental, so we can indeed make it reserve partitions
>> by
>> > default. No objections here.
>> >
>> > But with ScanQuery, I think it's better to add a "reservePartitions"
>> > property so that default behavior is not affected.
>> > Existing users may already have some logic in place to handle
>> > inconsistencies. So if we enable reservation by default,
>> > they'll get an unnecessary performance degradation.
>> >
>> > On Thu, Dec 9, 2021 at 4:42 PM Maksim Timonin <[email protected]>
>> > wrote:
>> >
>> >> Hi, Ivan, Pavel! Thanks for your responses.
>> >>
>> >> > But is there a real need/benefit from using scan queries over
>> primitive
>> >> SQL queries today
>> >>
>> >> In addition to Pavel's response, I'd like to add about IndexQuery.
>> >> This
>> >> query is in experimental status, but it shows great performance and
>> beats
>> >> SQL for *some* type of queries. For IndexQuery we can leverage on
>> >> understanding that we use only index and apply some optimizations:
>> >> skip
>> >> boundaries check (h2 Select.isConditionMet), apply filtering by inline
>> IO
>> >> instead of extracting fields from BinaryObjects, also we skip some H2
>> >> related stuff. Also it's not implemented yet, but for IndexQuery we
>> >> can
>> >> access data pages sequentially instead of randomly (by iterating with
>> >> IndexRow cursor by batches), and I expect it may make performance
>> >> better
>> >> in
>> >> some highload cases.
>> >>
>> >> > it is desirable that successful (but may be strange) scenarios
>> continue
>> >> to work after fix (and not fail e.g. due to some introduced exception)
>> >>
>> >> > Partition reservation should be opt-in if we decide to proceed.
>> >>
>> >> With the approach I proposed we can reach it. Without specified query
>> >> timeout (and cache queries don't have a public API for setting
>> >> timeout),
>> >> we
>> >> can retry our attempts to reserve partitions while we do not succeed.
>> So,
>> >> users may get some increase in query time execution under unstable
>> >> topology. But in exchange users will get more stable results, and
>> >> avoid
>> >> exceptions like in the ticket I showed [1].
>> >>
>> >> We can implement this logic for IndexQuery only (as it's only
>> >> experimental
>> >> since 2.12), and extend it on ScanQuery later after stabilizing with
>> >> IndexQuery. WDYT?
>> >>
>> >>
>> >> [1] https://issues.apache.org/jira/browse/IGNITE-12591
>> >>
>> >> On Thu, Dec 9, 2021 at 10:39 AM Pavel Tupitsyn <[email protected]>
>> >> wrote:
>> >>
>> >> > Agree with Ivan regarding compatibility.
>> >> > Partition reservation should be opt-in if we decide to proceed.
>> >> >
>> >> > > is there a real need/benefit from using
>> >> > > scan queries over primitive SQL queries today
>> >> >
>> >> > SQL requires additional configuration (QueryEntity) and involves
>> memory
>> >> and
>> >> > CPU overhead to maintain the indexes.
>> >> > Especially if the query is used rarely (maybe some maintenance
>> >> > tasks),
>> >> scan
>> >> > is a better option.
>> >> >
>> >> > On Thu, Dec 9, 2021 at 10:12 AM Ivan Pavlukhin <[email protected]>
>> >> > wrote:
>> >> >
>> >> > > Hi Maksim,
>> >> > >
>> >> > > Thank you for looking into this. While fixing wrong/surprising
>> >> > > behavior is very important, I also have some concerns, let's say,
>> >> > > from
>> >> > > different angle of view:
>> >> > > 1. From a first glance it seems that similar behavior of scan and
>> SQL
>> >> > > queries is a good idea. But is there a real need/benefit from
>> >> > > using
>> >> > > scan queries over primitive SQL queries today (e.g. SELECT * FROM
>> >> > > table1)?
>> >> > > 2. Also we need to carefully think about compatibility. It is
>> >> > > desirable that successful (but may be strange) scenarios continue
>> >> > > to
>> >> > > work after fix (and not fail e.g. due to some introduced
>> >> > > exception).
>> >> > >
>> >> > > Regarding partition reservation for long-time open cursors I guess
>> >> > > some ideas might be found in LAZY mode for SQL queries [1].
>> >> > >
>> >> > > [1] https://issues.apache.org/jira/browse/IGNITE-9171
>> >> > >
>> >> > > 2021-12-08 20:11 GMT+03:00, Maksim Timonin
>> >> > > <[email protected]
>> >:
>> >> > > > Hi, Igniters!
>> >> > > >
>> >> > > > There is a known issue that ScanQuery on unstable topology
>> >> > > > returns
>> >> > > > incorrect results: duplicates data [1] or fails with an
>> >> > > > exception
>> >> [2].
>> >> > > The
>> >> > > > reason is ScanQuery doesn't reserve partitions.
>> >> > > >
>> >> > > > IndexQuery shares the same query processing as ScanQuery, and
>> >> > > > then
>> >> > > > it
>> >> > is
>> >> > > > also affected by unstable topology. I want to fix it for
>> >> > > > IndexQuery.
>> >> > > > IndexQuery should provide the same stability as SQL queries do -
>> no
>> >> > > > occasional failures or data duplication.
>> >> > > >
>> >> > > > I dived into the SQL query processing and found that we can
>> >> > > > unify
>> >> logic
>> >> > > for
>> >> > > > SQL queries and cache queries (ScanQuery, IndexQuery,
>> >> > > > TextQuery):
>> >> > > >
>> >> > > > 1. Currently, cache queries use `GridCacheQueryAdapter#nodes` to
>> >> > > > find
>> >> > > nodes
>> >> > > > to run a query. From the other side, SQL processing uses for the
>> >> > > > same
>> >> > > goal
>> >> > > > `ReducePartitionMapper#nodesForPartitions`. It looks like those
>> >> methods
>> >> > > > have some slight differences, but in the common, they do the
>> >> > > > same.
>> >> So,
>> >> > I
>> >> > > > propose to make a single method for both cases.
>> >> > > >
>> >> > > > 2. SQL processing uses `PartitionReservationManager` for reserve
>> >> > > partitions
>> >> > > > on the map side. I propose to move it to the ignite-core module
>> and
>> >> > start
>> >> > > > using it for the cache queries.
>> >> > > >
>> >> > > > 3. Implement retries for the cache queries in case we failed to
>> >> reserve
>> >> > > > partitions on the map side.
>> >> > > >
>> >> > > > Currently, I see a downside of reserving partitions for cache
>> >> queries:
>> >> > > > cache queries are lazy. And the time of partition reservation
>> >> > > > depends
>> >> > on
>> >> > > a
>> >> > > > user's application code (how fast a cursor is iterated and
>> closed).
>> >> > > AFAIU,
>> >> > > > it's not very good to have a partition in reserve too long.
>> Please,
>> >> > > correct
>> >> > > > me if I'm wrong here.
>> >> > > >
>> >> > > > But from the other side, Ignite reserves partitions for
>> >> > > > ScanQuery
>> >> when
>> >> > a
>> >> > > > partition has been specified as a ScanQuery parameter, and
>> >> > > > Ignite
>> >> > > reserves
>> >> > > > partitions for SQL with the flag lazy=true. Also:
>> >> > > > - IndexQuery: I expect simple queries that will return a
>> relatively
>> >> > small
>> >> > > > amount of data. Then partitions wouldn't be reserved too much
>> time.
>> >> > > > - the same is for TextQuery - it returns a limited amount of
>> >> > > > data
>> >> (due
>> >> > to
>> >> > > > the Lucene logic).
>> >> > > > - full ScanQuery it's not in use much, AFAIK. So, it's by
>> >> > > > default
>> a
>> >> > > pretty
>> >> > > > heavy operation.
>> >> > > >
>> >> > > > So, I think it's safe to reserve partitions in any case. But
>> >> > > > there
>> >> > could
>> >> > > be
>> >> > > > an alternative optimistic approach, smth like that:
>> >> > > > 1. Check that topology is stable and waiting while it's not
>> >> stabilized.
>> >> > > > 2. Run a query on a stable cluster.
>> >> > > > 3. In cases when a cluster becomes unstable during query
>> >> > > > execution
>> >> > > > -
>> >> > try
>> >> > > to
>> >> > > > reserve partitions at runtime (query should be aware of topology
>> >> > changes)
>> >> > > > and fail in case of reservation failure (if a user already
>> >> > > > fetched
>> >> some
>> >> > > > data).
>> >> > > >
>> >> > > > I don't like the idea of this optimistic approach because in
>> >> > > > case
>> a
>> >> > user
>> >> > > > got some data, we don't have a better solution than to fail a
>> query
>> >> in
>> >> > > case
>> >> > > > of cluster instability and reservation failure.
>> >> > > >
>> >> > > > Igniters, WDYT?
>> >> > > >
>> >> > > > [1] https://issues.apache.org/jira/browse/IGNITE-12591
>> >> > > > [2] https://issues.apache.org/jira/browse/IGNITE-16031
>> >> > > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > >
>> >> > > Best regards,
>> >> > > Ivan Pavlukhin
>> >> > >
>> >> >
>> >>
>> >
>>
>>
>> --
>>
>> Best regards,
>> Ivan Pavlukhin
>>
>


-- 

Best regards,
Ivan Pavlukhin

Re: [DISCUSSION] Reserve partitions for CacheQueries

Reply via email to