[ 
https://issues.apache.org/jira/browse/RATIS-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated RATIS-1577:
------------------------------------
    Description: 
Found this while testing Ozone HA with Ratis built from current {{master}}:

{code}
scm2_1      | 2022-05-08 09:36:23,915 [grpc-default-executor-0] INFO 
impl.SnapshotInstallationHandler: 
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: receive 
installSnapshot: 
517d4b01-482b-4891-b0de-80441d4eb637->dd69b21b-bf75-4092-bd68-19f526eb2628#0-t2,notify:(t:1,
 i:0)
scm2_1      | 2022-05-08 09:36:23,923 [grpc-default-executor-0] INFO 
server.RaftServer$Division: 
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: change Leader from 
null to 517d4b01-482b-4891-b0de-80441d4eb637 at term 2 for installSnapshot, 
leader elected after 2809ms
scm2_1      | 2022-05-08 09:36:23,927 [grpc-default-executor-0] INFO 
impl.SnapshotInstallationHandler: 
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: Received notification 
to install snapshot at index 0
scm2_1      | 2022-05-08 09:36:23,958 [grpc-default-executor-0] INFO 
impl.SnapshotInstallationHandler: 
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: notifyInstallSnapshot: 
nextIndex is 0 but the leader's first available index is 0.
scm2_1      | 2022-05-08 09:36:23,960 [grpc-default-executor-0] INFO 
ha.SCMStateMachine: Received install snapshot notification from SCM leader: 
scm1:9894 with term index: (t:1, i:0)
scm2_1      | 2022-05-08 09:36:23,961 [pool-16-thread-1] INFO 
ha.SCMHAManagerImpl: Downloading checkpoint from leader SCM scm1 and reloading 
state from the checkpoint.
scm2_1      | 2022-05-08 09:36:23,965 [grpc-default-executor-0] ERROR 
impl.SnapshotInstallationHandler: 
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: installSnapshot failed
scm2_1      | java.lang.IllegalStateException: 
inProgressInstallSnapshotRequest: 0 is not eligible, firstAvailableLogIndex: 0
scm2_1      |   at 
org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
scm2_1      |   at 
org.apache.ratis.server.impl.SnapshotInstallationHandler.notifyStateMachineToInstallSnapshot(SnapshotInstallationHandler.java:287)
scm2_1      |   at 
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:115)
scm2_1      |   at 
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:84)
scm2_1      |   at 
org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1427)
{code}

The log above is from docker-compose cluster, but it is easily reproducible 
using MiniCluster-based integration test, too: {{mvn clean test 
-Dtest=TestStorageContainerManagerHA}}

  was:
Found this while testing Ozone HA with Ratis built from current {{master}}:

{code}
scm2_1      | 2022-05-08 09:36:23,915 [grpc-default-executor-0] INFO 
impl.SnapshotInstallationHandler: 
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: receive 
installSnapshot: 
517d4b01-482b-4891-b0de-80441d4eb637->dd69b21b-bf75-4092-bd68-19f526eb2628#0-t2,notify:(t:1,
 i:0)
scm2_1      | 2022-05-08 09:36:23,923 [grpc-default-executor-0] INFO 
server.RaftServer$Division: 
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: change Leader from 
null to 517d4b01-482b-4891-b0de-80441d4eb637 at term 2 for installSnapshot, 
leader elected after 2809ms
scm2_1      | 2022-05-08 09:36:23,927 [grpc-default-executor-0] INFO 
impl.SnapshotInstallationHandler: 
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: Received notification 
to install snapshot at index 0
scm2_1      | 2022-05-08 09:36:23,958 [grpc-default-executor-0] INFO 
impl.SnapshotInstallationHandler: 
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: notifyInstallSnapshot: 
nextIndex is 0 but the leader's first available index is 0.
scm2_1      | 2022-05-08 09:36:23,960 [grpc-default-executor-0] INFO 
ha.SCMStateMachine: Received install snapshot notification from SCM leader: 
scm1:9894 with term index: (t:1, i:0)
scm2_1      | 2022-05-08 09:36:23,961 [pool-16-thread-1] INFO 
ha.SCMHAManagerImpl: Downloading checkpoint from leader SCM scm1 and reloading 
state from the checkpoint.
scm2_1      | 2022-05-08 09:36:23,965 [grpc-default-executor-0] ERROR 
impl.SnapshotInstallationHandler: 
dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: installSnapshot failed
scm2_1      | java.lang.IllegalStateException: 
inProgressInstallSnapshotRequest: 0 is not eligible, firstAvailableLogIndex: 0
scm2_1      |   at 
org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
scm2_1      |   at 
org.apache.ratis.server.impl.SnapshotInstallationHandler.notifyStateMachineToInstallSnapshot(SnapshotInstallationHandler.java:287)
scm2_1      |   at 
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:115)
scm2_1      |   at 
org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:84)
scm2_1      |   at 
org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1427)
{code}


> Install snapshot failure 
> -------------------------
>
>                 Key: RATIS-1577
>                 URL: https://issues.apache.org/jira/browse/RATIS-1577
>             Project: Ratis
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 2.3.0
>            Reporter: Attila Doroszlai
>            Priority: Blocker
>
> Found this while testing Ozone HA with Ratis built from current {{master}}:
> {code}
> scm2_1      | 2022-05-08 09:36:23,915 [grpc-default-executor-0] INFO 
> impl.SnapshotInstallationHandler: 
> dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: receive 
> installSnapshot: 
> 517d4b01-482b-4891-b0de-80441d4eb637->dd69b21b-bf75-4092-bd68-19f526eb2628#0-t2,notify:(t:1,
>  i:0)
> scm2_1      | 2022-05-08 09:36:23,923 [grpc-default-executor-0] INFO 
> server.RaftServer$Division: 
> dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: change Leader from 
> null to 517d4b01-482b-4891-b0de-80441d4eb637 at term 2 for installSnapshot, 
> leader elected after 2809ms
> scm2_1      | 2022-05-08 09:36:23,927 [grpc-default-executor-0] INFO 
> impl.SnapshotInstallationHandler: 
> dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: Received 
> notification to install snapshot at index 0
> scm2_1      | 2022-05-08 09:36:23,958 [grpc-default-executor-0] INFO 
> impl.SnapshotInstallationHandler: 
> dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: 
> notifyInstallSnapshot: nextIndex is 0 but the leader's first available index 
> is 0.
> scm2_1      | 2022-05-08 09:36:23,960 [grpc-default-executor-0] INFO 
> ha.SCMStateMachine: Received install snapshot notification from SCM leader: 
> scm1:9894 with term index: (t:1, i:0)
> scm2_1      | 2022-05-08 09:36:23,961 [pool-16-thread-1] INFO 
> ha.SCMHAManagerImpl: Downloading checkpoint from leader SCM scm1 and 
> reloading state from the checkpoint.
> scm2_1      | 2022-05-08 09:36:23,965 [grpc-default-executor-0] ERROR 
> impl.SnapshotInstallationHandler: 
> dd69b21b-bf75-4092-bd68-19f526eb2628@group-9D638F7E0D95: installSnapshot 
> failed
> scm2_1      | java.lang.IllegalStateException: 
> inProgressInstallSnapshotRequest: 0 is not eligible, firstAvailableLogIndex: 0
> scm2_1      |         at 
> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
> scm2_1      |         at 
> org.apache.ratis.server.impl.SnapshotInstallationHandler.notifyStateMachineToInstallSnapshot(SnapshotInstallationHandler.java:287)
> scm2_1      |         at 
> org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshotImpl(SnapshotInstallationHandler.java:115)
> scm2_1      |         at 
> org.apache.ratis.server.impl.SnapshotInstallationHandler.installSnapshot(SnapshotInstallationHandler.java:84)
> scm2_1      |         at 
> org.apache.ratis.server.impl.RaftServerImpl.installSnapshot(RaftServerImpl.java:1427)
> {code}
> The log above is from docker-compose cluster, but it is easily reproducible 
> using MiniCluster-based integration test, too: {{mvn clean test 
> -Dtest=TestStorageContainerManagerHA}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to