[
https://issues.apache.org/jira/browse/IGNITE-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036763#comment-18036763
]
Mirza Aliev commented on IGNITE-26918:
--------------------------------------
What was done:
* We fixed a bug in
`TableManager#onZoneReplicaDestroyed` in
```
CompletableFuture<?>[] futures =
zoneTablesRawSet(zonePartitionId.zoneId()).stream()
.map(table -> supplyAsync(
() -> inBusyLockAsync(
busyLock,
() -> stopAndDestroyTablePartition(
new
TablePartitionId(table.tableId(), zonePartitionId.partitionId()),
parameters.causalityToken()
)
),
ioExecutor).thenCompose(identity()))
.toArray(CompletableFuture[]::new);
return allOf(futures);
```
where `.thenCompose(identity())` was missed, and that caused the situation that
zone resources were treated as stopped but table storages actually were in the
middle of destroying
* In `PartitionReplicaLifecycleManager#stopPartitionInternal` code
```
return replicaMgr.stopReplica(zonePartitionId)
.thenCompose(replicaWasStopped -> {
afterReplicaStopAction.accept(replicaWasStopped);
if (!replicaWasStopped) {
return nullCompletedFuture();
}
replicationGroupIds.remove(zonePartitionId);
return
fireEvent(afterReplicaStoppedEvent, eventParameters);
});
```
lack `thenComposeAsync`, this bug led to
`[async-destroy-group-22-partition-2-task-5 is not allowed to do STORAGE_WRITE]`
* The code
```
CompletableFuture<Void> operationFuture = new CompletableFuture<Void>()
.whenComplete((v, throwable) ->
ongoingOperationsById.remove(operationId))
.orTimeout(TIMEOUT_SECONDS, TimeUnit.SECONDS);
```
had a bug, `.whenComplete((v, throwable) ->
ongoingOperationsById.remove(operationId))` has never been invoked
* Wrong node in `testRestartPartitionsWithCleanUp` test was chosen for
`restartPartitionsWithCleanup`. The problem is that future that is returned
from `restartPartitionsWithCleanup` is treated like really completed only when
`restartPartitionsWithCleanup` invoked on the node that is passed to the
method.
> ItDisasterRecoveryControllerRestartPartitionsWithCleanupTest.testRestartTablePartitionsWithCleanupAllPartitions
> is flaky
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-26918
> URL: https://issues.apache.org/jira/browse/IGNITE-26918
> Project: Ignite
> Issue Type: Bug
> Reporter: Alexander Lapin
> Assignee: Mirza Aliev
> Priority: Major
> Labels: MakeTeamcityGreenAgain, ignite-3
> Attachments: _Integration_Tests_Module_REST_23787.log.zip
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> {code:java}
> io.micronaut.http.client.exceptions.HttpClientResponseException:
> {"title":"Internal Server Error","status":500,"detail":"tableId=22,
> partitionId=23"} at
> app//io.micronaut.http.client.netty.DefaultHttpClient$FullHttpResponseHandler.makeErrorFromRequestBody(DefaultHttpClient.java:2232)
> ...
> Caused by: java.lang.AssertionError: tableId=22, partitionId=23 at
> org.apache.ignite.internal.table.distributed.TableManager.getPartitionStorages(TableManager.java:2762)
> {code}
> [TC
> Link|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_IntegrationTests_ModuleRest/9602231]
>
> Please, pay attention, that besides aforementioned assertion exception
> there's a Critical System Error
> {code:java}
> 18:32:45 Caused by: java.util.concurrent.CompletionException:
> org.apache.ignite.internal.storage.StorageDestroyedException: IGN-CMN-65535
> Storage is in the process of being destroyed or already destroyed:
> [tableId=22, partitionId=23] TraceId:78319bfc
> 18:32:45 at
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
> 18:32:45 at
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
> 18:32:45 at
> java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:791)
> 18:32:45 ... 4 more
> 18:32:45 Caused by:
> org.apache.ignite.internal.storage.StorageDestroyedException: Storage is in
> the process of being destroyed or already destroyed: [tableId=22,
> partitionId=23]
> 18:32:45 at
> org.apache.ignite.internal.storage.util.StorageUtils.throwExceptionDependingOnStorageState(StorageUtils.java:147)
> 18:32:45 at
> org.apache.ignite.internal.storage.pagememory.mv.AbstractPageMemoryMvPartitionStorage.busy(AbstractPageMemoryMvPartitionStorage.java:712)
> 18:32:45 at
> org.apache.ignite.internal.storage.pagememory.mv.PersistentPageMemoryMvPartitionStorage.committedGroupConfiguration(PersistentPageMemoryMvPartitionStorage.java:287)
> 18:32:45 at
> org.apache.ignite.internal.storage.ThreadAssertingMvPartitionStorage.committedGroupConfiguration(ThreadAssertingMvPartitionStorage.java:79)
> 18:32:45 at
> org.apache.ignite.internal.table.distributed.raft.snapshot.SnapshotAwarePartitionDataStorage.committedGroupConfiguration(SnapshotAwarePartitionDataStorage.java:135)
> 18:32:45 at
> org.apache.ignite.internal.table.distributed.raft.PartitionListener.<init>(PartitionListener.java:229)
> 18:32:45 at
> org.apache.ignite.internal.table.distributed.TableManager.preparePartitionResourcesAndLoadToZoneReplicaBusy(TableManager.java:1076)
> 18:32:45 at
> org.apache.ignite.internal.table.distributed.TableManager.lambda$createPartitionsAndLoadResourcesToZoneReplica$14(TableManager.java:779)
> 18:32:45 at
> org.apache.ignite.internal.util.IgniteUtils.inBusyLock(IgniteUtils.java:920)
> 18:32:45 at
> org.apache.ignite.internal.table.distributed.TableManager.lambda$createPartitionsAndLoadResourcesToZoneReplica$15(TableManager.java:763)
> 18:32:45 at
> java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:787)
> {code}
> [TC
> link|https://ci.ignite.apache.org/buildConfiguration/ApacheIgnite3xGradle_Test_RunAllTests/9601666?showLog=9601643_33483_103.1233&logFilter=debug&logView=flowAware]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)