- for in-memory caches, affinity would calculate with SAT/BLAT on the first
step and because of it collocation would work between in-memory and
persistent caches;
- on the next step, if there are offline nodes, we would spread their
partitions among alive nodes. This would save us from data loss.
+1 to this approach.
I can't estimate how hard is it to implement, but seems like it solves
both collocation and data loss issues.
Best Regards,
Ivan Rakov
On 24.04.2018 20:29, Eduard Shangareev wrote:
Igniters,
I have introduced DAT in opposition to BLAT (SAT) because they reflect how
Ignite works.
But I actually have concerns about the necessity of such separation.
DAT exists only because we don't want to lose any data in in-memory caches.
But there are alternatives. Besides BLAT auto-change policies I would pay
attention to next approach:
- for in-memory caches, affinity would calculate with SAT/BLAT on the first
step and because of it collocation would work between in-memory and
persistent caches;
- on the next step, if there are offline nodes, we would spread their
partitions among alive nodes. This would save us from data loss.
I don't want to propose any changes until we don't have consensus.
On Tue, Apr 24, 2018 at 7:55 PM, Alexey Goncharuk <
alexey.goncha...@gmail.com> wrote:
Vladimir,
Automatic cluster membership changes may be implemented to grow the
topology, but auto-shrinking topology is usually not possible because a
process cannot distinguish between a node shutdown and network
partitioning. If we want to deal with split-brain scenarios as a grown-up
system, we should change the replication strategy within partitions to a
consensus algorithm (I really hope we will). None of the consensus
algorithms (at least known to me - paxos, raft, ZAB) do auto cluster
adjustments based on a internally-detected process failure. I consider
baseline topology as a step towards this model.
Addressing your second concern, If a node was down for a short period of
time, we should (and we do) rebalance only deltas, which is faster than
erasing the whole node and moving all data from scratch.
2018-04-24 19:42 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
Ivan,
This reasoning sounds questionable to me. First, separate logic for in
memory and persistent regions means that we loose collocation between
persistent and non persistent caches. Second, “data is still on disk”
assumption might be not valid if node has left due to disk crash, or when
data is updated on remaining nodes.
вт, 24 апр. 2018 г. в 19:21, Ivan Rakov <ivan.glu...@gmail.com>:
Stan,
I believe it was discussed at the design proposal thread:
http://apache-ignite-developers.2346864.n4.nabble.
com/Cluster-auto-activation-design-proposal-td20295.html
The short answer: backup factor decreases if node leaves. In
non-persistent mode we have to rebalance data ASAP - otherwise last
node
that owns partition may fail and data will be lost forever.
This is not necessary if data is persisted to disk storage, that's the
reason for Baseline Topology concept.
Best Regards,
Ivan Rakov
On 24.04.2018 18:48, Stanislav Lukyanov wrote:
+ for Vladimir's point - adding more complexity may (and likely will)
be
even more misleading.
Can we take a step back and discuss why do we need to have different
behavior for persistent and in-memory caches? Can we make in-memory
caches
honor baseline instead of special-casing them?
Thanks,
Stan
вт, 24 апр. 2018 г., 18:28 Vladimir Ozerov <voze...@gridgain.com>:
Guys,
As a user I definitely do not want to think about BLATs, SATs, DATs,
whatsoever. I want to query data, iterate over data, send compute
tasks
to
data. If certain node is outside of BLAT and do not have data, then
this is
not affinity node. Can we just fix affinity logic to take in count
BLAT
appropriately?
On Tue, Apr 24, 2018 at 6:12 PM, Ivan Rakov <ivan.glu...@gmail.com>
wrote:
Eduard,
Can you please summarize code changes that you are proposing?
I agree that BLT is a bit misleading term and DAT/SAT make more
sense.
However, establishing a consensus on v2.4 Baseline Topology
terminology
took a long time and seems like you are going to cause a bit more
perturbations.
I still don't understand what and how should be changed. Please
provide
summary of upcoming class renamings and changes of existing system
parts.
Best Regards,
Ivan Rakov
On 24.04.2018 17:46, Eduard Shangareev wrote:
Hi, Igniters,
I want to raise a topic about our affinity node definition.
After adding baseline (affinity) topology (BL(A)T) things start
being
complicated.
Plenty of bugs appears:
IGNITE-8173
ignite.getOrCreateCache(cacheConfig).iterator() method works
incorrect
for
replicated cache in case if some data node isn't in baseline
IGNITE-7628
SqlQuery hangs indefinitely with additional not registered in
baseline
node.
It's because everything relies on concept "affinity node".
And until now it was as simple as a server node which passes node
filter.
Other words any server node which is not filtered out by node
filter.
But node which is not in BL(A)T and which passes node filter would
be
treated as affinity node. And it's definitely wrong. At least, it
is a
source of many bugs (I believe there are much more than those 2
which
I
already have mentioned).
It's clear that this definition should be changed.
Let's start with a new definition of "Affinity topology". Affinity
topology
is a set of nodes which potentially could keep data.
If we use knowledge about the current realization we can say that
1.
for
in-memory cache groups it would be all server nodes;
2. for persistent cache groups it would be BL(A)T.
I will further use Dynamic Affinity Topology or DAT for 1
(in-memory
cache
groups) and Static Affinity Topology or SAT instead BL(A)T, or 2nd
point.
Denote node filter as f(X), where X is affinity topology.
Then we can say that node A is affinity node if
A ∈ AT', where AT' = f(AT), where AT is DAT or SAT.
It worth to mention that AT' should be used to pass to affinity
function
of
cache groups.
Also, AT and AT' could change during the time (BL(A)T changes or
node
joins/disconnections).
And I don't like fact that usage of DAT or SAT relies on
persistence
settings (Should we make it configurable per cache group?).
Ok, I have created a ticket to implement this changes and will
start
working on it.
https://issues.apache.org/jira/browse/IGNITE-8380 (Affinity node
calculation doesn't take into account BLT).
Also, I want to use these definitions (Affinity Topology, Affinity
Node,
DAT, SAT) in documentation and java docs.
Maybe, we also should consider replacing BL(A)T with SAT.
Thank you for your attention.