[1] http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html
On Thu, Dec 13, 2018 at 11:38 PM Vladimir Ozerov <voze...@gridgain.com> wrote: > Denis, > > Not really. They are used to ensure that ordering of notifications is > consistent with ordering of updates, so that when a key K is updated to V1, > then V2, then V3, you never observe V1 -> V3 -> V2. It also solves > duplicate notification problem in case of node failures, when the same > update is delivered twice. > > However, partition counters are unable to solve duplicates problem in > general. Essentially, the question is how to get consistent view on some > data plus all notifications which happened afterwards. There are only two > ways to achieve this - either lock entries during initial query, or take a > kind of consistent data snapshot. The former was never implemented in > Ignite - our Scan and SQL queries do not user locking. The latter is > achievable in theory with MVCC. I raised that question earlier [1] (see > p.2), and we came to conclusion that it might be a good feature for the > product. It is not implemented that way for MVCC now, but most probably is > not extraordinary difficult to implement. > > Vladimir. > > [1] > http://apache-ignite-developers.2346864.n4.nabble.com/Continuous-queries-and-MVCC-td33972.html#a33998 > > On Thu, Dec 13, 2018 at 11:17 PM Denis Magda <dma...@apache.org> wrote: > >> Vladimir, >> >> The partition counter is supposed to be used internally to solve the >> duplication issue. Does it sound like a right approach then? >> >> What would be an approach for SQL queries? Not sure the partition counter >> is applicable. >> >> -- >> Denis >> >> On Thu, Dec 13, 2018 at 11:16 AM Vladimir Ozerov <voze...@gridgain.com> >> wrote: >> >> > Partition counter is internal implemenattion detail, which has no >> sensible >> > meaning to end users. It should not be exposed through public API. >> > >> > On Thu, Dec 13, 2018 at 10:14 PM Denis Magda <dma...@apache.org> wrote: >> > >> > > Hello Piotr, >> > > >> > > That's a known problem and I thought a JIRA ticket already exists. >> > However, >> > > failed to locate it. The ticket for the improvement should be created >> as >> > a >> > > result of this conversation. >> > > >> > > Speaking of an initial query type, I would differentiate from >> ScanQueries >> > > and SqlQueries. For the former, it sounds reasonable to apply the >> > > partitionCounter logic. As for the latter, Vladimir Ozerov will it be >> > > addressed as part of MVCC/Transactional SQL activities? >> > > >> > > Btw, Piotr what's your initial query type? >> > > >> > > -- >> > > Denis >> > > >> > > On Thu, Dec 13, 2018 at 3:28 AM Piotr Romański < >> piotr.roman...@gmail.com >> > > >> > > wrote: >> > > >> > > > Hi, as suggested by Ilya here: >> > > > >> > > > >> > > >> > >> http://apache-ignite-users.70518.x6.nabble.com/Continuous-queries-and-duplicates-td25314.html >> > > > I'm resending it to the developers list. >> > > > >> > > > From that thread we know that there might be duplicates between >> initial >> > > > query results and listener entries received as part of continuous >> > query. >> > > > That means that users need to manually dedupe data. >> > > > >> > > > In my opinion the manual deduplication in some use cases may lead to >> > > > possible memory problems on the client side. In order to remove >> > > duplicated >> > > > notifications which we are receiving in the local listener, we need >> to >> > > keep >> > > > all initial query results in memory (or at least their unique ids). >> > > > Unfortunately, there is no way (is there?) to find a point in time >> when >> > > we >> > > > can be sure that no dups will arrive anymore. That would mean that >> we >> > > need >> > > > to keep that data indefinitely and use it every time a new >> notification >> > > > arrives. In case of multiple continuous queries run from a single >> JVM, >> > > this >> > > > might eventually become a memory or performance problem. I can see >> the >> > > > following possible improvements to Ignite: >> > > > >> > > > 1. The deduplication between initial query and incoming notification >> > > could >> > > > be done fully in Ignite. As far as I know there is already the >> > > > updateCounter and partition id for all the objects so it could be >> used >> > > > internally. >> > > > >> > > > 2. Add a guarantee that notifications arriving in the local listener >> > > after >> > > > query() method returns are not duplicates. This kind of >> functionality >> > > would >> > > > require a specific synchronization inside Ignite. It would also mean >> > that >> > > > the query() method cannot return before all potential duplicates are >> > > > processed by a local listener what looks wrong. >> > > > >> > > > 3. Notify users that starting from a given notification they can be >> > sure >> > > > they will not receive any duplicates anymore. This could be an >> > additional >> > > > boolean flag in the CacheQueryEntryEvent. >> > > > >> > > > 4. CacheQueryEntryEvent already exposes the partitionUpdateCounter. >> > > > Unfortunately we don't have this information for initial query >> results. >> > > If >> > > > we had, a client could manually deduplicate notifications and get >> rid >> > of >> > > > initial query results for a given partition after newer >> notifications >> > > > arrive. Also it would be very convenient to expose partition id as >> well >> > > but >> > > > now we can figure it out using the affinity service. The assumption >> > here >> > > is >> > > > that notifications are ordered by partitionUpdateCounter (is it >> true?). >> > > > >> > > > Please correct me if I'm missing anything. >> > > > >> > > > What do you think? >> > > > >> > > > Piotr >> > > > >> > > >> > >> >