[
https://issues.apache.org/jira/browse/IGNITE-12950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vyacheslav Koptilin updated IGNITE-12950:
-----------------------------------------
Description:
We have method in GridDhtPartitionsStateValidator:
{code:java}
// public void validatePartitionCountersAndSizes(
GridDhtPartitionsExchangeFuture fut,
GridDhtPartitionTopology top,
Map<UUID, GridDhtPartitionsSingleMessage> messages
) throws IgniteCheckedException {
final Set<UUID> ignoringNodes = new HashSet<>();
// Ignore just joined nodes.
for (DiscoveryEvent evt : fut.events().events()) {
if (evt.type() == EVT_NODE_JOINED)
ignoringNodes.add(evt.eventNode().id());
}
AffinityTopologyVersion topVer =
fut.context().events().topologyVersion();
// Validate update counters.
Map<Integer, Map<UUID, Long>> result =
validatePartitionsUpdateCounters(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions update counters are
inconsistent for " + fold(topVer, result));
// For sizes validation ignore also nodes which are not able to send
cache sizes.
for (UUID id : messages.keySet()) {
ClusterNode node = cctx.discovery().node(id);
if (node != null &&
node.version().compareTo(SIZES_VALIDATION_AVAILABLE_SINCE) < 0)
ignoringNodes.add(id);
}
if (!cctx.cache().cacheGroup(top.groupId()).mvccEnabled()) { // TODO:
Remove "if" clause in IGNITE-9451.
// Validate cache sizes.
result = validatePartitionsSizes(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions cache sizes are
inconsistent for " + fold(topVer, result));
}
}
{code}
We should check partitions sizes even if update counters are different. It
could be helpful for debugging problems on production.
We must print information about all copies, if a partition is in an
inconsistent state. Now we could get the message on cache group with 3 backups:
{code:java}
// Partition states validation has failed for group: CACHEGROUP. Partitions
update counters are inconsistent for Part 3415: [10.104.6.10:47500=2577263
10.104.6.12:47500=2577263 10.104.6.23:47500=2577262 10.104.6.9:47500=2577263 ]
Part 4960: [10.104.6.11:47500=2560994 10.104.6.23:47500=2560993 ]
{code}
(part 4960 contains information about 2 copies only)
was:
We have method in GridDhtPartitionsStateValidator:
{code:java}
// public void validatePartitionCountersAndSizes(
GridDhtPartitionsExchangeFuture fut,
GridDhtPartitionTopology top,
Map<UUID, GridDhtPartitionsSingleMessage> messages
) throws IgniteCheckedException {
final Set<UUID> ignoringNodes = new HashSet<>();
// Ignore just joined nodes.
for (DiscoveryEvent evt : fut.events().events()) {
if (evt.type() == EVT_NODE_JOINED)
ignoringNodes.add(evt.eventNode().id());
}
AffinityTopologyVersion topVer =
fut.context().events().topologyVersion();
// Validate update counters.
Map<Integer, Map<UUID, Long>> result =
validatePartitionsUpdateCounters(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions update counters are
inconsistent for " + fold(topVer, result));
// For sizes validation ignore also nodes which are not able to send
cache sizes.
for (UUID id : messages.keySet()) {
ClusterNode node = cctx.discovery().node(id);
if (node != null &&
node.version().compareTo(SIZES_VALIDATION_AVAILABLE_SINCE) < 0)
ignoringNodes.add(id);
}
if (!cctx.cache().cacheGroup(top.groupId()).mvccEnabled()) { // TODO:
Remove "if" clause in IGNITE-9451.
// Validate cache sizes.
result = validatePartitionsSizes(top, messages, ignoringNodes);
if (!result.isEmpty())
throw new IgniteCheckedException("Partitions cache sizes are
inconsistent for " + fold(topVer, result));
}
}
{code}
{{}}
We should check partitions sizes even if update counters are different. It
could be helpful for debugging problems on production.
We must print information about all copies, if a partition is in an
inconsistent state. Now we could get the message on cache group with 3 backups:
{code:java}
// Partition states validation has failed for group: CACHEGROUP. Partitions
update counters are inconsistent for Part 3415: [10.104.6.10:47500=2577263
10.104.6.12:47500=2577263 10.104.6.23:47500=2577262 10.104.6.9:47500=2577263 ]
Part 4960: [10.104.6.11:47500=2560994 10.104.6.23:47500=2560993 ]
{code}
(part 4960 contains information about 2 copies only)
> Partitions validator must check sizes even if update counters are different
> ---------------------------------------------------------------------------
>
> Key: IGNITE-12950
> URL: https://issues.apache.org/jira/browse/IGNITE-12950
> Project: Ignite
> Issue Type: Improvement
> Components: cache
> Reporter: Ivan Mironovich
> Priority: Major
> Fix For: 2.9
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> We have method in GridDhtPartitionsStateValidator:
> {code:java}
> // public void validatePartitionCountersAndSizes(
> GridDhtPartitionsExchangeFuture fut,
> GridDhtPartitionTopology top,
> Map<UUID, GridDhtPartitionsSingleMessage> messages
> ) throws IgniteCheckedException {
> final Set<UUID> ignoringNodes = new HashSet<>();
> // Ignore just joined nodes.
> for (DiscoveryEvent evt : fut.events().events()) {
> if (evt.type() == EVT_NODE_JOINED)
> ignoringNodes.add(evt.eventNode().id());
> }
> AffinityTopologyVersion topVer =
> fut.context().events().topologyVersion();
> // Validate update counters.
> Map<Integer, Map<UUID, Long>> result =
> validatePartitionsUpdateCounters(top, messages, ignoringNodes);
> if (!result.isEmpty())
> throw new IgniteCheckedException("Partitions update counters are
> inconsistent for " + fold(topVer, result));
> // For sizes validation ignore also nodes which are not able to send
> cache sizes.
> for (UUID id : messages.keySet()) {
> ClusterNode node = cctx.discovery().node(id);
> if (node != null &&
> node.version().compareTo(SIZES_VALIDATION_AVAILABLE_SINCE) < 0)
> ignoringNodes.add(id);
> }
> if (!cctx.cache().cacheGroup(top.groupId()).mvccEnabled()) { // TODO:
> Remove "if" clause in IGNITE-9451.
> // Validate cache sizes.
> result = validatePartitionsSizes(top, messages, ignoringNodes);
> if (!result.isEmpty())
> throw new IgniteCheckedException("Partitions cache sizes are
> inconsistent for " + fold(topVer, result));
> }
> }
> {code}
> We should check partitions sizes even if update counters are different. It
> could be helpful for debugging problems on production.
> We must print information about all copies, if a partition is in an
> inconsistent state. Now we could get the message on cache group with 3
> backups:
> {code:java}
> // Partition states validation has failed for group: CACHEGROUP. Partitions
> update counters are inconsistent for Part 3415: [10.104.6.10:47500=2577263
> 10.104.6.12:47500=2577263 10.104.6.23:47500=2577262 10.104.6.9:47500=2577263
> ] Part 4960: [10.104.6.11:47500=2560994 10.104.6.23:47500=2560993 ]
> {code}
> (part 4960 contains information about 2 copies only)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)