[
https://issues.apache.org/jira/browse/IGNITE-12950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vyacheslav Koptilin updated IGNITE-12950:
-----------------------------------------
Description:
We have method in GridDhtPartitionsStateValidator:
{code:java}
// public void validatePartitionCountersAndSizes(
GridDhtPartitionsExchangeFuture fut,
GridDhtPartitionTopology top,
Map<UUID, GridDhtPartitionsSingleMessage> messages
) throws IgniteCheckedException {
final Set<UUID> ignoringNodes = new HashSet<>();
// Ignore just joined nodes.
for (DiscoveryEvent evt : fut.events().events()) {
if (evt.type() == EVT_NODE_JOINED)
ignoringNodes.add(evt.eventNode().id());
}
AffinityTopologyVersion topVer =
fut.context().events().topologyVersion();
// Validate update counters.
Map<Integer, Map<UUID, Long>> result =
validatePartitionsUpdateCounters(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions update counters are
inconsistent for " + fold(topVer, result));
// For sizes validation ignore also nodes which are not able to send
cache sizes.
for (UUID id : messages.keySet()) {
ClusterNode node = cctx.discovery().node(id);
if (node != null &&
node.version().compareTo(SIZES_VALIDATION_AVAILABLE_SINCE) < 0)
ignoringNodes.add(id);
}
if (!cctx.cache().cacheGroup(top.groupId()).mvccEnabled()) { // TODO:
Remove "if" clause in IGNITE-9451.
// Validate cache sizes.
result = validatePartitionsSizes(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions cache sizes are
inconsistent for " + fold(topVer, result));
}
}
{code}
{{}}
We should check partitions sizes even if update counters are different. It
could be helpful for debugging problems on production.
We must print information about all copies, if a partition is in an
inconsistent state. Now we could get the message on cache group with 3 backups:
{code:java}
// Partition states validation has failed for group: CACHEGROUP. Partitions
update counters are inconsistent for Part 3415: [10.104.6.10:47500=2577263
10.104.6.12:47500=2577263 10.104.6.23:47500=2577262 10.104.6.9:47500=2577263 ]
Part 4960: [10.104.6.11:47500=2560994 10.104.6.23:47500=2560993 ]
{code}
(part 4960 contains information about 2 copies only)
was:
We have method in GridDhtPartitionsStateValidator:
{code:java}
// public void validatePartitionCountersAndSizes(
GridDhtPartitionsExchangeFuture fut,
GridDhtPartitionTopology top,
Map<UUID, GridDhtPartitionsSingleMessage> messages
) throws IgniteCheckedException {
final Set<UUID> ignoringNodes = new HashSet<>();
// Ignore just joined nodes.
for (DiscoveryEvent evt : fut.events().events()) {
if (evt.type() == EVT_NODE_JOINED)
ignoringNodes.add(evt.eventNode().id());
}
AffinityTopologyVersion topVer =
fut.context().events().topologyVersion();
// Validate update counters.
Map<Integer, Map<UUID, Long>> result =
validatePartitionsUpdateCounters(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions update counters are
inconsistent for " + fold(topVer, result));
// For sizes validation ignore also nodes which are not able to send
cache sizes.
for (UUID id : messages.keySet()) {
ClusterNode node = cctx.discovery().node(id);
if (node != null &&
node.version().compareTo(SIZES_VALIDATION_AVAILABLE_SINCE) < 0)
ignoringNodes.add(id);
}
if (!cctx.cache().cacheGroup(top.groupId()).mvccEnabled()) { // TODO:
Remove "if" clause in IGNITE-9451.
// Validate cache sizes.
result = validatePartitionsSizes(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions cache sizes are
inconsistent for " + fold(topVer, result));
}
}
{code}
{{}}
We should check paritions sizes even if update counters are different. It could
be helpful for debug problems on production.
We must print information about all copies, if partition is in inconsistent
state. Now we could get message on cache group with 3 backups:
{code:java}
// Partition states validation has failed for group:
CACHEGROUP_PARTICLE_union-module_com.sbt.processing.data.partition.dpl.PartitionKey.
Partitions update counters are inconsistent for Part 3415:
[10.104.6.10:47500=2577263 10.104.6.12:47500=2577263 10.104.6.23:47500=2577262
10.104.6.9:47500=2577263 ] Part 4960: [10.104.6.11:47500=2560994
10.104.6.23:47500=2560993 ]
{code}
(part 4960 contains information about 2 copies only)
> Partitions validator must check sizes even if update counters are different
> ---------------------------------------------------------------------------
>
> Key: IGNITE-12950
> URL: https://issues.apache.org/jira/browse/IGNITE-12950
> Project: Ignite
> Issue Type: Improvement
> Components: cache
> Reporter: Ivan Mironovich
> Priority: Major
> Fix For: 2.9
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> We have method in GridDhtPartitionsStateValidator:
> {code:java}
> // public void validatePartitionCountersAndSizes(
> GridDhtPartitionsExchangeFuture fut,
> GridDhtPartitionTopology top,
> Map<UUID, GridDhtPartitionsSingleMessage> messages
> ) throws IgniteCheckedException {
> final Set<UUID> ignoringNodes = new HashSet<>();
> // Ignore just joined nodes.
> for (DiscoveryEvent evt : fut.events().events()) {
> if (evt.type() == EVT_NODE_JOINED)
> ignoringNodes.add(evt.eventNode().id());
> }
> AffinityTopologyVersion topVer =
> fut.context().events().topologyVersion();
> // Validate update counters.
> Map<Integer, Map<UUID, Long>> result =
> validatePartitionsUpdateCounters(top, messages, ignoringNodes);
> if (!result.isEmpty())
> throw new IgniteCheckedException("Partitions update counters are
> inconsistent for " + fold(topVer, result));
> // For sizes validation ignore also nodes which are not able to send
> cache sizes.
> for (UUID id : messages.keySet()) {
> ClusterNode node = cctx.discovery().node(id);
> if (node != null &&
> node.version().compareTo(SIZES_VALIDATION_AVAILABLE_SINCE) < 0)
> ignoringNodes.add(id);
> }
> if (!cctx.cache().cacheGroup(top.groupId()).mvccEnabled()) { // TODO:
> Remove "if" clause in IGNITE-9451.
> // Validate cache sizes.
> result = validatePartitionsSizes(top, messages, ignoringNodes);
> if (!result.isEmpty())
> throw new IgniteCheckedException("Partitions cache sizes are
> inconsistent for " + fold(topVer, result));
> }
> }
> {code}
> {{}}
> We should check partitions sizes even if update counters are different. It
> could be helpful for debugging problems on production.
> We must print information about all copies, if a partition is in an
> inconsistent state. Now we could get the message on cache group with 3
> backups:
> {code:java}
> // Partition states validation has failed for group: CACHEGROUP. Partitions
> update counters are inconsistent for Part 3415: [10.104.6.10:47500=2577263
> 10.104.6.12:47500=2577263 10.104.6.23:47500=2577262 10.104.6.9:47500=2577263
> ] Part 4960: [10.104.6.11:47500=2560994 10.104.6.23:47500=2560993 ]
> {code}
> (part 4960 contains information about 2 copies only)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)