[
https://issues.apache.org/jira/browse/IGNITE-22928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikhail Efremov updated IGNITE-22928:
-------------------------------------
Description:
*Description*
The issue with test is \{{TestPlacementDriver}} that returns only one node that
may not be in replication group at least at start of the test and thus have no
any replica and raft entities. It leads to \{{NPE}} in the follow code from
\{{PartitionReplicaLifecycleManager}}:
{code:title=|language=java|collapse=false}return localServicesStartFuture
.thenComposeAsync(v -> inBusyLock(busyLock, () ->
isLocalNodeIsPrimary(replicaGrpId)), ioExecutor)
.thenAcceptAsync(isLeaseholder -> inBusyLock(busyLock, () -> {
boolean isLocalNodeInStableOrPending =
isNodeInReducedStableOrPendingAssignments(
replicaGrpId,
stableAssignments,
pendingAssignments,
revision
);
if (!isLocalNodeInStableOrPending && !isLeaseholder) {
return;
}
assert isLocalNodeInStableOrPending || isLeaseholder
: "The local node is outside of the replication group
[inStableOrPending=" + isLocalNodeInStableOrPending
+ ", isLeaseholder=" + isLeaseholder + "].";
// For forced assignments, we exclude dead stable nodes, and
all alive stable nodes are already in pending assignments.
// Union is not required in such a case.
Set<Assignment> newAssignments = pendingAssignmentsAreForced
|| stableAssignments == null
? pendingAssignmentsNodes
: union(pendingAssignmentsNodes,
stableAssignments.nodes());
replicaMgr.replica(replicaGrpId)
.thenApply(Replica::raftClient)
.thenAccept(raftClient ->
raftClient.updateConfiguration(fromAssignments(newAssignments)));
}), ioExecutor);
{code}
On node that has been returning from \{{TestPlacementDriver}} will pass
\{{isLocalNodeIsPrimary}} check and all follow checks in any case, but the node
doesn't host a replication group, then there no replica future and then
\{{replicaMgr#replica}} returns \{{null}} and then \{{NPE}} on \{{null}}-value
is thrown.
The solution is to add to \{{TestPlacementDriver}} kind of mapping of
\{{ZonePartitionId}} to \{{ClusterNode}} of "primary" replica host node. But
there is an another problem: in debug we can see 25 partitions for zone 0. At
least not very suit to write 25 mappings in the map, but zone 0 is a common
public zone and is a subject of the test. Then, the solution is to reduce
default's zone partition number or add mapping for all it's partitions.
*Motivation*
The crucial test should be fixed.
*Definition of done*
The test is passed.
> Fix testZoneReplicaListener
> ---------------------------
>
> Key: IGNITE-22928
> URL: https://issues.apache.org/jira/browse/IGNITE-22928
> Project: Ignite
> Issue Type: Improvement
> Reporter: Mikhail Efremov
> Assignee: Mikhail Efremov
> Priority: Major
>
> *Description*
> The issue with test is \{{TestPlacementDriver}} that returns only one node
> that may not be in replication group at least at start of the test and thus
> have no any replica and raft entities. It leads to \{{NPE}} in the follow
> code from \{{PartitionReplicaLifecycleManager}}:
> {code:title=|language=java|collapse=false}return localServicesStartFuture
> .thenComposeAsync(v -> inBusyLock(busyLock, () ->
> isLocalNodeIsPrimary(replicaGrpId)), ioExecutor)
> .thenAcceptAsync(isLeaseholder -> inBusyLock(busyLock, () -> {
> boolean isLocalNodeInStableOrPending =
> isNodeInReducedStableOrPendingAssignments(
> replicaGrpId,
> stableAssignments,
> pendingAssignments,
> revision
> );
> if (!isLocalNodeInStableOrPending && !isLeaseholder) {
> return;
> }
> assert isLocalNodeInStableOrPending || isLeaseholder
> : "The local node is outside of the replication
> group [inStableOrPending=" + isLocalNodeInStableOrPending
> + ", isLeaseholder=" + isLeaseholder + "].";
> // For forced assignments, we exclude dead stable nodes,
> and all alive stable nodes are already in pending assignments.
> // Union is not required in such a case.
> Set<Assignment> newAssignments =
> pendingAssignmentsAreForced || stableAssignments == null
> ? pendingAssignmentsNodes
> : union(pendingAssignmentsNodes,
> stableAssignments.nodes());
> replicaMgr.replica(replicaGrpId)
> .thenApply(Replica::raftClient)
> .thenAccept(raftClient ->
> raftClient.updateConfiguration(fromAssignments(newAssignments)));
> }), ioExecutor);
> {code}
> On node that has been returning from \{{TestPlacementDriver}} will pass
> \{{isLocalNodeIsPrimary}} check and all follow checks in any case, but the
> node doesn't host a replication group, then there no replica future and then
> \{{replicaMgr#replica}} returns \{{null}} and then \{{NPE}} on
> \{{null}}-value is thrown.
> The solution is to add to \{{TestPlacementDriver}} kind of mapping of
> \{{ZonePartitionId}} to \{{ClusterNode}} of "primary" replica host node. But
> there is an another problem: in debug we can see 25 partitions for zone 0. At
> least not very suit to write 25 mappings in the map, but zone 0 is a common
> public zone and is a subject of the test. Then, the solution is to reduce
> default's zone partition number or add mapping for all it's partitions.
> *Motivation*
> The crucial test should be fixed.
> *Definition of done*
> The test is passed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)