Re: Continuous queries and duplicates

Denis Magda Thu, 13 Dec 2018 12:17:21 -0800

Vladimir,

The partition counter is supposed to be used internally to solve the
duplication issue. Does it sound like a right approach then?


What would be an approach for SQL queries? Not sure the partition counter
is applicable.

--
Denis

On Thu, Dec 13, 2018 at 11:16 AM Vladimir Ozerov <[email protected]>
wrote:

> Partition counter is internal implemenattion detail, which has no sensible
> meaning to end users. It should not be exposed through public API.
>
> On Thu, Dec 13, 2018 at 10:14 PM Denis Magda <[email protected]> wrote:
>
> > Hello Piotr,
> >
> > That's a known problem and I thought a JIRA ticket already exists.
> However,
> > failed to locate it. The ticket for the improvement should be created as
> a
> > result of this conversation.
> >
> > Speaking of an initial query type, I would differentiate from ScanQueries
> > and SqlQueries. For the former, it sounds reasonable to apply the
> > partitionCounter logic. As for the latter, Vladimir Ozerov will it be
> > addressed as part of MVCC/Transactional SQL activities?
> >
> > Btw, Piotr what's your initial query type?
> >
> > --
> > Denis
> >
> > On Thu, Dec 13, 2018 at 3:28 AM Piotr Romański <[email protected]
> >
> > wrote:
> >
> > > Hi, as suggested by Ilya here:
> > >
> > >
> >
> http://apache-ignite-users.70518.x6.nabble.com/Continuous-queries-and-duplicates-td25314.html
> > > I'm resending it to the developers list.
> > >
> > > From that thread we know that there might be duplicates between initial
> > > query results and listener entries received as part of continuous
> query.
> > > That means that users need to manually dedupe data.
> > >
> > > In my opinion the manual deduplication in some use cases may lead to
> > > possible memory problems on the client side. In order to remove
> > duplicated
> > > notifications which we are receiving in the local listener, we need to
> > keep
> > > all initial query results in memory (or at least their unique ids).
> > > Unfortunately, there is no way (is there?) to find a point in time when
> > we
> > > can be sure that no dups will arrive anymore. That would mean that we
> > need
> > > to keep that data indefinitely and use it every time a new notification
> > > arrives. In case of multiple continuous queries run from a single JVM,
> > this
> > > might eventually become a memory or performance problem. I can see the
> > > following possible improvements to Ignite:
> > >
> > > 1. The deduplication between initial query and incoming notification
> > could
> > > be done fully in Ignite. As far as I know there is already the
> > > updateCounter and partition id for all the objects so it could be used
> > > internally.
> > >
> > > 2. Add a guarantee that notifications arriving in the local listener
> > after
> > > query() method returns are not duplicates. This kind of functionality
> > would
> > > require a specific synchronization inside Ignite. It would also mean
> that
> > > the query() method cannot return before all potential duplicates are
> > > processed by a local listener what looks wrong.
> > >
> > > 3. Notify users that starting from a given notification they can be
> sure
> > > they will not receive any duplicates anymore. This could be an
> additional
> > > boolean flag in the CacheQueryEntryEvent.
> > >
> > > 4. CacheQueryEntryEvent already exposes the partitionUpdateCounter.
> > > Unfortunately we don't have this information for initial query results.
> > If
> > > we had, a client could manually deduplicate notifications and get rid
> of
> > > initial query results for a given partition after newer notifications
> > > arrive. Also it would be very convenient to expose partition id as well
> > but
> > > now we can figure it out using the affinity service. The assumption
> here
> > is
> > > that notifications are ordered by partitionUpdateCounter (is it true?).
> > >
> > > Please correct me if I'm missing anything.
> > >
> > > What do you think?
> > >
> > > Piotr
> > >
> >
>

Re: Continuous queries and duplicates

Reply via email to