[
https://issues.apache.org/jira/browse/RATIS-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Attila Doroszlai updated RATIS-1577:
------------------------------------
Description:
Found this while testing Ozone HA with Ratis built from current {{master}}:
{code}
scm2_1 | 2022-05-08 09:36:23,915 [grpc-default-executor-0] INFO
impl.SnapshotInstallationHandler:
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: receive
installSnapshot:
517d4b01-482b-4891-b0de-80441d4eb637->dd69b21b-bf75-4092-bd68-19f526eb2628#0-t2,notify:(t:1,
i:0)
scm2_1 | 2022-05-08 09:36:23,923 [grpc-default-executor-0] INFO
server.RaftServer$Division:
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: change Leader from
null to 517d4b01-482b-4891-b0de-80441d4eb637 at term 2 for installSnapshot,
leader elected after 2809ms
scm2_1 | 2022-05-08 09:36:23,927 [grpc-default-executor-0] INFO
impl.SnapshotInstallationHandler:
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: Received notification
to install snapshot at index 0
scm2_1 | 2022-05-08 09:36:23,958 [grpc-default-executor-0] INFO
impl.SnapshotInstallationHandler:
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: notifyInstallSnapshot:
nextIndex is 0 but the leader's first available index is 0.
scm2_1 | 2022-05-08 09:36:23,960 [grpc-default-executor-0] INFO
ha.SCMStateMachine: Received install snapshot notification from SCM leader:
scm1:9894 with term index: (t:1, i:0)
scm2_1 | 2022-05-08 09:36:23,961 [pool-16-thread-1] INFO
ha.SCMHAManagerImpl: Downloading checkpoint from leader SCM scm1 and reloading
state from the checkpoint.
scm2_1 | 2022-05-08 09:36:23,965 [grpc-default-executor-0] ERROR
impl.SnapshotInstallationHandler:
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: installSnapshot failed
scm2_1 | java.lang.IllegalStateException:
inProgressInstallSnapshotRequest: 0 is not eligible, firstAvailableLogIndex: 0
scm2_1 | at
org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
scm2_1 | at
org.apache.ratis.server.impl.SnapshotInstallationHandler.notifyStateMachineToInstallSnapshot(SnapshotInstallationHandler.java:287)
scm2_1 | at
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:115)
scm2_1 | at
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:84)
scm2_1 | at
org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1427)
{code}
The log above is from docker-compose cluster, but it is easily reproducible
using MiniCluster-based integration test, too: {{mvn clean test
-Dtest=TestStorageContainerManagerHA}}
was:
Found this while testing Ozone HA with Ratis built from current {{master}}:
{code}
scm2_1 | 2022-05-08 09:36:23,915 [grpc-default-executor-0] INFO
impl.SnapshotInstallationHandler:
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: receive
installSnapshot:
517d4b01-482b-4891-b0de-80441d4eb637->dd69b21b-bf75-4092-bd68-19f526eb2628#0-t2,notify:(t:1,
i:0)
scm2_1 | 2022-05-08 09:36:23,923 [grpc-default-executor-0] INFO
server.RaftServer$Division:
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: change Leader from
null to 517d4b01-482b-4891-b0de-80441d4eb637 at term 2 for installSnapshot,
leader elected after 2809ms
scm2_1 | 2022-05-08 09:36:23,927 [grpc-default-executor-0] INFO
impl.SnapshotInstallationHandler:
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: Received notification
to install snapshot at index 0
scm2_1 | 2022-05-08 09:36:23,958 [grpc-default-executor-0] INFO
impl.SnapshotInstallationHandler:
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: notifyInstallSnapshot:
nextIndex is 0 but the leader's first available index is 0.
scm2_1 | 2022-05-08 09:36:23,960 [grpc-default-executor-0] INFO
ha.SCMStateMachine: Received install snapshot notification from SCM leader:
scm1:9894 with term index: (t:1, i:0)
scm2_1 | 2022-05-08 09:36:23,961 [pool-16-thread-1] INFO
ha.SCMHAManagerImpl: Downloading checkpoint from leader SCM scm1 and reloading
state from the checkpoint.
scm2_1 | 2022-05-08 09:36:23,965 [grpc-default-executor-0] ERROR
impl.SnapshotInstallationHandler:
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: installSnapshot failed
scm2_1 | java.lang.IllegalStateException:
inProgressInstallSnapshotRequest: 0 is not eligible, firstAvailableLogIndex: 0
scm2_1 | at
org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
scm2_1 | at
org.apache.ratis.server.impl.SnapshotInstallationHandler.notifyStateMachineToInstallSnapshot(SnapshotInstallationHandler.java:287)
scm2_1 | at
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:115)
scm2_1 | at
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:84)
scm2_1 | at
org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1427)
{code}
> Install snapshot failure
> -------------------------
>
> Key: RATIS-1577
> URL: https://issues.apache.org/jira/browse/RATIS-1577
> Project: Ratis
> Issue Type: Bug
> Components: server
> Affects Versions: 2.3.0
> Reporter: Attila Doroszlai
> Priority: Blocker
>
> Found this while testing Ozone HA with Ratis built from current {{master}}:
> {code}
> scm2_1 | 2022-05-08 09:36:23,915 [grpc-default-executor-0] INFO
> impl.SnapshotInstallationHandler:
> dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: receive
> installSnapshot:
> 517d4b01-482b-4891-b0de-80441d4eb637->dd69b21b-bf75-4092-bd68-19f526eb2628#0-t2,notify:(t:1,
> i:0)
> scm2_1 | 2022-05-08 09:36:23,923 [grpc-default-executor-0] INFO
> server.RaftServer$Division:
> dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: change Leader from
> null to 517d4b01-482b-4891-b0de-80441d4eb637 at term 2 for installSnapshot,
> leader elected after 2809ms
> scm2_1 | 2022-05-08 09:36:23,927 [grpc-default-executor-0] INFO
> impl.SnapshotInstallationHandler:
> dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: Received
> notification to install snapshot at index 0
> scm2_1 | 2022-05-08 09:36:23,958 [grpc-default-executor-0] INFO
> impl.SnapshotInstallationHandler:
> dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95:
> notifyInstallSnapshot: nextIndex is 0 but the leader's first available index
> is 0.
> scm2_1 | 2022-05-08 09:36:23,960 [grpc-default-executor-0] INFO
> ha.SCMStateMachine: Received install snapshot notification from SCM leader:
> scm1:9894 with term index: (t:1, i:0)
> scm2_1 | 2022-05-08 09:36:23,961 [pool-16-thread-1] INFO
> ha.SCMHAManagerImpl: Downloading checkpoint from leader SCM scm1 and
> reloading state from the checkpoint.
> scm2_1 | 2022-05-08 09:36:23,965 [grpc-default-executor-0] ERROR
> impl.SnapshotInstallationHandler:
> dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: installSnapshot
> failed
> scm2_1 | java.lang.IllegalStateException:
> inProgressInstallSnapshotRequest: 0 is not eligible, firstAvailableLogIndex: 0
> scm2_1 | at
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
> scm2_1 | at
> org.apache.ratis.server.impl.SnapshotInstallationHandler.notifyStateMachineToInstallSnapshot(SnapshotInstallationHandler.java:287)
> scm2_1 | at
> org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:115)
> scm2_1 | at
> org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:84)
> scm2_1 | at
> org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1427)
> {code}
> The log above is from docker-compose cluster, but it is easily reproducible
> using MiniCluster-based integration test, too: {{mvn clean test
> -Dtest=TestStorageContainerManagerHA}}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)