Re: Continuous queries and duplicates

Vladimir Ozerov Thu, 13 Dec 2018 12:40:47 -0800

[1]
http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html


On Thu, Dec 13, 2018 at 11:38 PM Vladimir Ozerov <voze...@gridgain.com>
wrote:

> Denis,
>
> Not really. They are used to ensure that ordering of notifications is
> consistent with ordering of updates, so that when a key K is updated to V1,
> then V2, then V3, you never observe V1 -> V3 -> V2. It also solves
> duplicate notification problem in case of node failures, when the same
> update is delivered twice.
>
> However, partition counters are unable to solve duplicates problem in
> general. Essentially, the question is how to get consistent view on some
> data plus all notifications which happened afterwards. There are only two
> ways to achieve this - either lock entries during initial query, or take a
> kind of consistent data snapshot. The former was never implemented in
> Ignite - our Scan and SQL queries do not user locking. The latter is
> achievable in theory with MVCC. I raised that question earlier [1] (see
> p.2), and we came to conclusion that it might be a good feature for the
> product. It is not implemented that way for MVCC now, but most probably is
> not extraordinary difficult to implement.
>
> Vladimir.
>
> [1]
> http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html#a33998
>
> On Thu, Dec 13, 2018 at 11:17 PM Denis Magda <dma...@apache.org> wrote:
>
>> Vladimir,
>>
>> The partition counter is supposed to be used internally to solve the
>> duplication issue. Does it sound like a right approach then?
>>
>> What would be an approach for SQL queries? Not sure the partition counter
>> is applicable.
>>
>> --
>> Denis
>>
>> On Thu, Dec 13, 2018 at 11:16 AM Vladimir Ozerov <voze...@gridgain.com>
>> wrote:
>>
>> > Partition counter is internal implemenattion detail, which has no
>> sensible
>> > meaning to end users. It should not be exposed through public API.
>> >
>> > On Thu, Dec 13, 2018 at 10:14 PM Denis Magda <dma...@apache.org> wrote:
>> >
>> > > Hello Piotr,
>> > >
>> > > That's a known problem and I thought a JIRA ticket already exists.
>> > However,
>> > > failed to locate it. The ticket for the improvement should be created
>> as
>> > a
>> > > result of this conversation.
>> > >
>> > > Speaking of an initial query type, I would differentiate from
>> ScanQueries
>> > > and SqlQueries. For the former, it sounds reasonable to apply the
>> > > partitionCounter logic. As for the latter, Vladimir Ozerov will it be
>> > > addressed as part of MVCC/Transactional SQL activities?
>> > >
>> > > Btw, Piotr what's your initial query type?
>> > >
>> > > --
>> > > Denis
>> > >
>> > > On Thu, Dec 13, 2018 at 3:28 AM Piotr Romański <
>> piotr.roman...@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > Hi, as suggested by Ilya here:
>> > > >
>> > > >
>> > >
>> >
>> http://apache-ignite-users.70518.x6.nabble.com/Continuous-queries-and-duplicates-td25314.html
>> > > > I'm resending it to the developers list.
>> > > >
>> > > > From that thread we know that there might be duplicates between
>> initial
>> > > > query results and listener entries received as part of continuous
>> > query.
>> > > > That means that users need to manually dedupe data.
>> > > >
>> > > > In my opinion the manual deduplication in some use cases may lead to
>> > > > possible memory problems on the client side. In order to remove
>> > > duplicated
>> > > > notifications which we are receiving in the local listener, we need
>> to
>> > > keep
>> > > > all initial query results in memory (or at least their unique ids).
>> > > > Unfortunately, there is no way (is there?) to find a point in time
>> when
>> > > we
>> > > > can be sure that no dups will arrive anymore. That would mean that
>> we
>> > > need
>> > > > to keep that data indefinitely and use it every time a new
>> notification
>> > > > arrives. In case of multiple continuous queries run from a single
>> JVM,
>> > > this
>> > > > might eventually become a memory or performance problem. I can see
>> the
>> > > > following possible improvements to Ignite:
>> > > >
>> > > > 1. The deduplication between initial query and incoming notification
>> > > could
>> > > > be done fully in Ignite. As far as I know there is already the
>> > > > updateCounter and partition id for all the objects so it could be
>> used
>> > > > internally.
>> > > >
>> > > > 2. Add a guarantee that notifications arriving in the local listener
>> > > after
>> > > > query() method returns are not duplicates. This kind of
>> functionality
>> > > would
>> > > > require a specific synchronization inside Ignite. It would also mean
>> > that
>> > > > the query() method cannot return before all potential duplicates are
>> > > > processed by a local listener what looks wrong.
>> > > >
>> > > > 3. Notify users that starting from a given notification they can be
>> > sure
>> > > > they will not receive any duplicates anymore. This could be an
>> > additional
>> > > > boolean flag in the CacheQueryEntryEvent.
>> > > >
>> > > > 4. CacheQueryEntryEvent already exposes the partitionUpdateCounter.
>> > > > Unfortunately we don't have this information for initial query
>> results.
>> > > If
>> > > > we had, a client could manually deduplicate notifications and get
>> rid
>> > of
>> > > > initial query results for a given partition after newer
>> notifications
>> > > > arrive. Also it would be very convenient to expose partition id as
>> well
>> > > but
>> > > > now we can figure it out using the affinity service. The assumption
>> > here
>> > > is
>> > > > that notifications are ordered by partitionUpdateCounter (is it
>> true?).
>> > > >
>> > > > Please correct me if I'm missing anything.
>> > > >
>> > > > What do you think?
>> > > >
>> > > > Piotr
>> > > >
>> > >
>> >
>>
>

Re: Continuous queries and duplicates

Reply via email to