Re: New definition for affinity node (issues with baseline)

Ivan Rakov Tue, 24 Apr 2018 10:40:40 -0700

- for in-memory caches, affinity would calculate with SAT/BLAT on the first
step and because of it collocation would work between in-memory and
persistent caches;
- on the next step, if there are offline nodes, we would spread their
partitions among alive nodes. This would save us from data loss.

+1 to this approach.

I can't estimate how hard is it to implement, but seems like it solvesboth collocation and data loss issues.


Best Regards,
Ivan Rakov

On 24.04.2018 20:29, Eduard Shangareev wrote:

Igniters,

I have introduced DAT in opposition to BLAT (SAT) because they reflect how
Ignite works.

But I actually have concerns about the necessity of such separation.

DAT exists only because we don't want to lose any data in in-memory caches.

But there are alternatives. Besides BLAT auto-change policies I would pay
attention to next approach:
- for in-memory caches, affinity would calculate with SAT/BLAT on the first
step and because of it collocation would work between in-memory and
persistent caches;
- on the next step, if there are offline nodes, we would spread their
partitions among alive nodes. This would save us from data loss.

I don't want to propose any changes until we don't have consensus.



On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk <
alexey.goncha...@gmail.com> wrote:

Vladimir,

Automatic cluster membership changes may be implemented to grow the
topology, but auto-shrinking topology is usually not possible because a
process cannot distinguish between a node shutdown and network
partitioning. If we want to deal with split-brain scenarios as a grown-up
system, we should change the replication strategy within partitions to a
consensus algorithm (I really hope we will). None of the consensus
algorithms (at least known to me - paxos, raft, ZAB) do auto cluster
adjustments based on a internally-detected process failure. I consider
baseline topology as a step towards this model.

Addressing your second concern, If a node was down for a short period of
time, we should (and we do) rebalance only deltas, which is faster than
erasing the whole node and moving all data from scratch.

2018-04-24 19:42 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:

Ivan,

This reasoning sounds questionable to me. First, separate logic for in
memory and persistent regions means that we loose collocation between
persistent and non persistent caches. Second, “data is still on disk”
assumption might be not valid if node has left due to disk crash, or when
data is updated on remaining nodes.

вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <ivan.glu...@gmail.com>:

Stan,

I believe it was discussed at the design proposal thread:

http://apache-ignite-developers.2346864.n4.nabble.

com/Cluster-auto-activation-design-proposal-td20295.html

The short answer: backup factor decreases if node leaves. In
non-persistent mode we have to rebalance data ASAP - otherwise last

node

that owns partition may fail and data will be lost forever.
This is not necessary if data is persisted to disk storage, that's the
reason for Baseline Topology concept.

Best Regards,
Ivan Rakov

On 24.04.2018 18:48, Stanislav Lukyanov wrote:

+ for Vladimir's point - adding more complexity may (and likely will)

be

even more misleading.

Can we take a step back and discuss why do we need to have different
behavior for persistent and in-memory caches? Can we make in-memory

caches

honor baseline instead of special-casing them?

Thanks,
Stan


вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <voze...@gridgain.com>:

Guys,

As a user I definitely do not want to think about BLATs, SATs, DATs,
whatsoever. I want to query data, iterate over data, send compute

tasks

to

data. If certain node is outside of BLAT and do not have data, then

this is

not affinity node. Can we just fix affinity logic to take in count

BLAT

appropriately?

On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <ivan.glu...@gmail.com>

wrote:

Eduard,

Can you please summarize code changes that you are proposing?
I agree that BLT is a bit misleading term and DAT/SAT make more

sense.

However, establishing a consensus on v2.4 Baseline Topology

terminology

took a long time and seems like you are going to cause a bit more
perturbations.
I still don't understand what and how should be changed. Please

provide

summary of upcoming class renamings and changes of existing system

parts.

Best Regards,
Ivan Rakov


On 24.04.2018 17:46, Eduard Shangareev wrote:

Hi, Igniters,

I want to raise a topic about our affinity node definition.

After adding baseline (affinity) topology (BL(A)T) things start

being

complicated.

Plenty of bugs appears:

IGNITE-8173
ignite.getOrCreateCache(cacheConfig).iterator() method works

incorrect

for
replicated cache in case if some data node isn't in baseline

IGNITE-7628
SqlQuery hangs indefinitely with additional not registered in

baseline

node.

It's because everything relies on concept "affinity node".
And until now it was as simple as a server node which passes node

filter.

Other words any server node which is not filtered out by node

filter.

But node which is not in BL(A)T and which passes node filter would

be

treated as affinity node. And it's definitely wrong. At least, it

is a

source of many bugs (I believe there are much more than those 2

which

already have mentioned).

It's clear that this definition should be changed.
Let's start with a new definition of "Affinity topology". Affinity
topology
is a set of nodes which potentially could keep data.

If we use knowledge about the current realization we can say that

1.

for

in-memory cache groups it would be all server nodes;
2. for persistent cache groups it would be BL(A)T.

I will further use Dynamic Affinity Topology or DAT for 1

(in-memory

cache

groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd

point.

Denote node filter as f(X), where X is affinity topology.

Then we can say that node A is affinity node if
A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.

It worth to mention that AT' should be used to pass to affinity

function

of
cache groups.
Also, AT and AT' could change during the time (BL(A)T changes or

node

joins/disconnections).

And I don't like fact that usage of DAT or SAT relies on

persistence

settings (Should we make it configurable per cache group?).

Ok, I have created a ticket to implement this changes and will

start

working on it.
https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node
calculation doesn't take into account BLT).

Also, I want to use these definitions (Affinity Topology, Affinity

Node,

DAT, SAT) in documentation and java docs.

Maybe, we also should consider replacing BL(A)T with SAT.

Thank you for your attention.

Re: New definition for affinity node (issues with baseline)

Reply via email to