Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Dmitry Pavlov Fri, 27 Oct 2017 06:52:03 -0700

Hi Denis,

I had short chat with Alex G.


 You're right, It may be a bug. I'll prepare my reproducer and add is as
test. Also I will raise the ticket if count(*) will give incorrect result.

Sincerely,
Dmitry Pavlov

пт, 27 окт. 2017 г., 1:48 Denis Magda <dma...@apache.org>:

> Dmitriy,
>
> I don’t see why a result of a simple query such as “select count(*) from
> t;” should be different if a rebalancing is in progress or after a cluster
> restart. Ignite’s SQL engine claims that its fault-tolerant and returns a
> consistent result set all the times unless a partition loss happened. Here
> is we don’t have a partition loss, thus, seems we caught a bug.
>
> Vladimir O., please chime in.
>
> —
> Denis
>
> On Oct 26, 2017, at 3:34 PM, Dmitry Pavlov <dpavlov....@gmail.com> wrote:
>
> Hi Denis
>
> It seems to me that this is not a bug for my scenario, because the data
> was not loaded within the same transaction using transactional cache. In
> this case it is ok that cache data is rebalanced according to partition
> update counters,isn't it?
>
> I suppose in this case the data was not lost ,it was just not completely
> transferred to the second node.
>
> Sincerely,
>
> чт, 26 окт. 2017 г., 21:09 Denis Magda <dma...@apache.org>:
>
>> + dev list
>>
>> This scenario has to be handled automatically by Ignite. Seems like a
>> bug. Please refer to the initial description of the issue. Alex G, please
>> have a look:
>>
>> To reproduce:
>> 1. create a replicated cache with multiple indexedtypes, with some indexes
>> 2. Start first server node
>> 3. Insert data into cache (1000000 entries)
>> 4. Start second server node
>>
>> At this point, seems all is ok, data is apparently successfully rebalanced
>> making sql queries (count(*))
>>
>> 5. Stop server nodes
>> 6. Restart server nodes
>> 7. Doing sql queries (count(*)) returns less data
>>
>> —
>> Denis
>>
>> > On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dpavlov....@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > I tried to write the same code that will execute the described
>> scenario. The results are as follows:
>> > If I do not give enough time to completely rebalance partitions, then
>> the newly launched node will not have enough data to count(*).
>> > If I do not wait for enough time to allow to distribute the data on the
>> grid, the query will return a smaller number - the number of records that
>> have been uploaded to the node. I guess there is
>> GridDhtPartitionDemandMessage’s can be found in Ignite debug log in this
>> moment.
>> >
>> > If I wait for a sufficient amount of time or directly call the wait on
>> the newly joined node
>> > ignite2.cache (CACHE) .rebalance (). get ();
>> > then all results will be correct.
>> >
>> > About your question>  what's happen if one cluster node crashes in the
>> middle of rebalance process?
>> > In this case normal failover scenario is started, data is rebalanced
>> within cluster. And if there is enought WAL records on nodes representing
>> history from crash point, then only recent changes (delta) will be send
>> over network. If there is no enought history to apply rebalance with most
>> recent changes, then partition will be rebalanced from scratch to new node.
>> >
>> > Sincerely,
>> > Pavlov Dmitry
>> >
>> >
>> > сб, 21 окт. 2017 г. в 2:07, Manu <maxn...@hotmail.com <mailto:
>> maxn...@hotmail.com>>:
>> > Hi,
>> >
>> > after restart data seems not be consistent.
>> >
>> > We have been waiting until rebalance was fully completed to restart the
>> > cluster to check if durable memory data rebalance works correctly and
>> sql
>> > queries still work.
>> > Another question (it´s not this case), what's happen if one cluster node
>> > crashes in the middle of rebalance process?
>> >
>> > Thanks!
>> >
>> >
>> >
>> > --
>> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ <
>> http://apache-ignite-users.70518.x6.nabble.com/>
>
>
>

Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled

Reply via email to