[
https://issues.apache.org/jira/browse/IGNITE-13353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Uttsel updated IGNITE-13353:
-----------------------------------
Description:
GridDhtPartitionsExchangeFuture#resetOwnersByCounter is part of PME, which is
executed on coordinator and performs a logic which compares update counters and
changes states of outdated partitions from OWNING to MOVING.
It doesn't happen on every PME. If node joins or cache start, we have to
execute this logic in order to detect whether partitions should be rebalanced.
On the contrary, if node leaves, there's no need to perform reset.
Whether or not to reset counters is controlled by the flag in
GridDhtPartitionsExchangeFuture#assignPartitionStates method:
{code:java}
if (firstDiscoEvt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
assert firstDiscoEvt instanceof DiscoveryCustomEvent;
if (activateCluster() || changedBaseline())
assignPartitionsStates(true);
DiscoveryCustomMessage discoveryCustomMessage =
((DiscoveryCustomEvent) firstDiscoEvt).customMessage();
if (discoveryCustomMessage instanceof DynamicCacheChangeBatch) {
if (exchActions != null) {
Set<String> caches =
exchActions.cachesToResetLostPartitions();
if (!F.isEmpty(caches))
resetLostPartitions(caches);
assignPartitionsStates(true);
}
}
else if (discoveryCustomMessage instanceof
SnapshotDiscoveryMessage
&&
((SnapshotDiscoveryMessage)discoveryCustomMessage).needAssignPartitions()) {
markAffinityReassign();
assignPartitionsStates(true);
}
}
else if (exchCtx.events().hasServerJoin())
assignPartitionsStates(true);
else if (exchCtx.events().hasServerLeft())
assignPartitionsStates(false);
{code}
Right now, DynamicCacheChangeBatch triggers partition validation for any cache
start / stop event happens. As a result, resetOwners logic is executed for all
caches. We can work this issue around by executing resetOwners only for a
started cache.
was:
Right now, DynamicCacheChangeBatch triggers partition validation for all
caches. It can (and most possibly it will) show inconsistency between
partitions in case of using DataStreamer. It will lead to the data rebalance,
while it's not needed - updates should be streamed to backup nodes after some
time due to asynchronous behavior of DataStramer
In the attached log file cache was created at 1:33, it led to the partitions
evictions right after that and to the rebalance at ~2:36(they have
rebalanceDelay = 1hour). This behaviour led to these 2 drops almost to zero in
write operations:
> DynamicCacheChangeBatch invokes partition validation for all caches
> -------------------------------------------------------------------
>
> Key: IGNITE-13353
> URL: https://issues.apache.org/jira/browse/IGNITE-13353
> Project: Ignite
> Issue Type: Bug
> Components: cache
> Reporter: Sergey Uttsel
> Assignee: Sergey Uttsel
> Priority: Major
> Fix For: 2.10
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> GridDhtPartitionsExchangeFuture#resetOwnersByCounter is part of PME, which is
> executed on coordinator and performs a logic which compares update counters
> and changes states of outdated partitions from OWNING to MOVING.
> It doesn't happen on every PME. If node joins or cache start, we have to
> execute this logic in order to detect whether partitions should be
> rebalanced. On the contrary, if node leaves, there's no need to perform reset.
> Whether or not to reset counters is controlled by the flag in
> GridDhtPartitionsExchangeFuture#assignPartitionStates method:
> {code:java}
> if (firstDiscoEvt.type() == EVT_DISCOVERY_CUSTOM_EVT) {
> assert firstDiscoEvt instanceof DiscoveryCustomEvent;
> if (activateCluster() || changedBaseline())
> assignPartitionsStates(true);
> DiscoveryCustomMessage discoveryCustomMessage =
> ((DiscoveryCustomEvent) firstDiscoEvt).customMessage();
> if (discoveryCustomMessage instanceof
> DynamicCacheChangeBatch) {
> if (exchActions != null) {
> Set<String> caches =
> exchActions.cachesToResetLostPartitions();
> if (!F.isEmpty(caches))
> resetLostPartitions(caches);
> assignPartitionsStates(true);
> }
> }
> else if (discoveryCustomMessage instanceof
> SnapshotDiscoveryMessage
> &&
> ((SnapshotDiscoveryMessage)discoveryCustomMessage).needAssignPartitions()) {
> markAffinityReassign();
> assignPartitionsStates(true);
> }
> }
> else if (exchCtx.events().hasServerJoin())
> assignPartitionsStates(true);
> else if (exchCtx.events().hasServerLeft())
> assignPartitionsStates(false);
> {code}
> Right now, DynamicCacheChangeBatch triggers partition validation for any
> cache start / stop event happens. As a result, resetOwners logic is executed
> for all caches. We can work this issue around by executing resetOwners only
> for a started cache.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)