Ivan created IGNITE-12950:
-----------------------------
Summary: Partitions validator must check sizes even if update
counters are different
Key: IGNITE-12950
URL: https://issues.apache.org/jira/browse/IGNITE-12950
Project: Ignite
Issue Type: Improvement
Components: cache
Reporter: Ivan
Fix For: 2.9
We have method in GridDhtPartitionsStateValidator:
{code:java}
// public void validatePartitionCountersAndSizes(
GridDhtPartitionsExchangeFuture fut,
GridDhtPartitionTopology top,
Map<UUID, GridDhtPartitionsSingleMessage> messages
) throws IgniteCheckedException {
final Set<UUID> ignoringNodes = new HashSet<>();
// Ignore just joined nodes.
for (DiscoveryEvent evt : fut.events().events()) {
if (evt.type() == EVT_NODE_JOINED)
ignoringNodes.add(evt.eventNode().id());
}
AffinityTopologyVersion topVer =
fut.context().events().topologyVersion();
// Validate update counters.
Map<Integer, Map<UUID, Long>> result =
validatePartitionsUpdateCounters(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions update counters are
inconsistent for " + fold(topVer, result));
// For sizes validation ignore also nodes which are not able to send
cache sizes.
for (UUID id : messages.keySet()) {
ClusterNode node = cctx.discovery().node(id);
if (node != null &&
node.version().compareTo(SIZES_VALIDATION_AVAILABLE_SINCE) < 0)
ignoringNodes.add(id);
}
if (!cctx.cache().cacheGroup(top.groupId()).mvccEnabled()) { // TODO:
Remove "if" clause in IGNITE-9451.
// Validate cache sizes.
result = validatePartitionsSizes(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions cache sizes are
inconsistent for " + fold(topVer, result));
}
}
{code}
{{}}
We should check paritions sizes even if update counters are different. It could
be helpful for debug problems on production.
We must print information about all copies, if partition is in inconsistent
state. Now we could get message on cache group with 3 backups:
{code:java}
// Partition states validation has failed for group:
CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey.
Partitions update counters are inconsistent for Part 3415:
[10.104.6.10:47500=2577263 10.104.6.12:47500=2577263 10.104.6.23:47500=2577262
10.104.6.9:47500=2577263 ] Part 4960: [10.104.6.11:47500=2560994
10.104.6.23:47500=2560993 ]
{code}
(part 4960 contains information about 2 copies only)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)