SWJTU-ZhangLei opened a new issue, #17475: URL: https://github.com/apache/doris/issues/17475
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version ff99b7d048c87512e7a2b8da661d641343b049b3 ### What's Wrong? 2023-03-05 19:26:00,059 INFO (main|1) [Env.loadDeleteHandler():1873] finished replay deleteHandler from image 2023-03-05 19:26:00,060 INFO (main|1) [Env.loadSqlBlockRule():1948] finished replay sqlBlockRule from image 2023-03-05 19:26:00,063 INFO (main|1) [Env.loadPolicy():1959] finished replay policy from image 2023-03-05 19:26:00,063 INFO (main|1) [MetaReader.read():104] finished to load image in 1116 ms 2023-03-05 19:26:01,776 INFO (UNKNOWN 172.21.0.68_9310_1677056472917(-1)|1) [BDBEnvironment.setup():162] add helper[172.21.0.68:9310] as ReplicationGroupAdmin 2023-03-05 19:26:01,780 WARN (UNKNOWN 172.21.0.68_9310_1677056472917(-1)|1) [Env.notifyNewFETypeTransfer():2372] notify new FE type transfer: UNKNOWN 2023-03-05 19:26:01,795 WARN (RepNode 172.21.0.68_9310_1677056472917(-1)|62) [Env.notifyNewFETypeTransfer():2372] notify new FE type transfer: FOLLOWER 2023-03-05 19:26:01,801 INFO (stateListener|74) [Env$4.runOneCycle():2395] begin to transfer FE type from INIT to UNKNOWN 2023-03-05 19:26:01,801 INFO (stateListener|74) [Env$4.runOneCycle():2482] finished to transfer FE type to UNKNOWN 2023-03-05 19:26:01,802 INFO (stateListener|74) [Env$4.runOneCycle():2395] begin to transfer FE type from UNKNOWN to FOLLOWER 2023-03-05 19:26:01,802 INFO (stateListener|74) [BDBHA.addHelperSocket():241] add 172.21.0.140:9310 to helper sockets 2023-03-05 19:26:01,803 INFO (stateListener|74) [BDBHA.addHelperSocket():241] add 172.21.0.133:9310 to helper sockets 2023-03-05 19:26:01,806 INFO (replayer|77) [Env.replayJournal():2499] replayed journal id is 4711661, replay to journal id is 4715842 2023-03-05 19:26:01,807 WARN (REPLICA 172.21.0.68_9310_1677056472917(1)|62) [Env.notifyNewFETypeTransfer():2372] notify new FE type transfer: UNKNOWN 2023-03-05 19:26:01,814 INFO (replayer|77) [DatabaseTransactionMgr.replayUpsertTransactionState():1743] replay a visible transaction TransactionState. transaction id: 1742061, label: fde4db8d-c751-4a31-b5 56-c776f8ed735b, db id: 744452, table id list: 3292640, callback id: -1, coordinator: BE: 172.21.0.27, transaction status: VISIBLE, error replicas num: 0, replica ids: , prepare time: 1678010954365, commi t time: 1678010960666, finish time: 1678010960936, reason: 2023-03-05 19:26:01,843 WARN (UNKNOWN 172.21.0.68_9310_1677056472917(1)|62) [Env.notifyNewFETypeTransfer():2372] notify new FE type transfer: FOLLOWER 2023-03-05 19:26:01,856 WARN (REPLICA 172.21.0.68_9310_1677056472917(1)|62) [BDBStateChangeListener.stateChange():57] this node is DETACHED 2023-03-05 19:26:01,850 WARN (replayer|77) [BDBJournalCursor.next():147] Catch an exception when get next JournalEntity. key:4711663 com.sleepycat.je.rep.RollbackException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.rep.RollbackException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.0 .68_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb Node 172.21.0.68_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb must rollback 1 total commits(1 of which were durable) to t he earliest point indicated by transaction id=-4812378 time=2023-03-05 18:24:01.739 vlsn=9,528,445 lsn=0xfb/0x426da3 durable=false in order to rejoin the replication group. All existing ReplicatedEnvironm ent handles must be closed and reinstantiated. Log files were truncated to file 0x251, offset 0x4353230, vlsn 9,528,443 HARD_RECOVERY: Rolled back past transaction commit or abort. Must run recovery by r e-opening Environment handles Environment is invalid and must be closed. Environment invalid because of previous exception: (JE 18.3.12) 172.21.0.68_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-me ta/bdb Node 172.21.0.68_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb must rollback 1 total commits(1 of which were durable) to the earliest point indicated by transaction id=-4812378 time =2023-03-05 18:24:01.739 vlsn=9,528,445 lsn=0xfb/0x426da3 durable=false in order to rejoin the replication group. All existing ReplicatedEnvironment handles must be closed and reinstantiated. Log files w ere truncated to file 0x251, offset 0x4353230, vlsn 9,528,443 HARD_RECOVERY: Rolled back past transaction commit or abort. Must run recovery by re-opening Environment handles Environment is invalid and mu st be closed. at com.sleepycat.je.rep.RollbackException.wrapSelf(RollbackException.java:146) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.RollbackException.wrapSelf(RollbackException.java:62) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1844) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Environment.checkOpen(Environment.java:2697) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Database.checkEnv(Database.java:2413) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Database.get(Database.java:1370) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Database.get(Database.java:1462) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at org.apache.doris.journal.bdbje.BDBJournalCursor.next(BDBJournalCursor.java:107) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.catalog.Env.replayJournal(Env.java:2509) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.catalog.Env$3.runOneCycle(Env.java:2297) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT] Caused by: com.sleepycat.je.rep.RollbackException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.0.68_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb Node 172.21.0.6 8_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb must rollback 1 total commits(1 of which were durable) to the earliest point indicated by transaction id=-4812378 time=2023-03-05 18:24:01.7 39 vlsn=9,528,445 lsn=0xfb/0x426da3 durable=false in order to rejoin the replication group. All existing ReplicatedEnvironment handles must be closed and reinstantiated. Log files were truncated to file 0x251, offset 0x4353230, vlsn 9,528,443 HARD_RECOVERY: Rolled back past transaction commit or abort. Must run recovery by re-opening Environment handles Environment is invalid and must be closed. Original ly thrown by HA thread: REPLICA 172.21.0.68_9310_1677056472917(1) Originally thrown by HA thread: REPLICA 172.21.0.68_9310_1677056472917(1) at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupHardRecovery(ReplicaFeederSyncup.java:721) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:417) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:164) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:732) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] 2023-03-05 19:26:01,863 WARN (replayer|77) [Env.setCanRead():2343] meta out of date. current time: 1678015561863, synchronized time: 0, has log: true, fe type: UNKNOWN 2023-03-05 19:26:01,902 INFO (UNKNOWN 172.21.0.68_9310_1677056472917(-1)|1) [Env.waitForReady():895] wait catalog to be ready. FE type: UNKNOWN. is ready: false, counter: 1 2023-03-05 19:26:03,904 INFO (UNKNOWN 172.21.0.68_9310_1677056472917(-1)|1) [Env.waitForReady():895] wait catalog to be ready. FE type: UNKNOWN. is ready: false, counter: 21 2023-03-05 19:26:05,906 INFO (UNKNOWN 172.21.0.68_9310_1677056472917(-1)|1) [Env.waitForReady():895] wait catalog to be ready. FE type: UNKNOWN. is ready: false, counter: 41 2023-03-05 19:26:06,865 WARN (replayer|77) [BDBJEJournal.getDatabaseNames():446] catch rollback log exception. will reopen the ReplicatedEnvironment. com.sleepycat.je.rep.RollbackException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.rep.RollbackException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.0.68_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb Node 172.21.0.68_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb must rollback 1 total commits(1 of which were durable) to the earliest point indicated by transaction id=-4812378 time=2023-03-05 18:24:01.739 vlsn=9,528,445 lsn=0xfb/0x426da3 durable=false in order to rejoin the replication group. All existing ReplicatedEnvironment handles must be closed and reinstantiated. Log files were truncated to file 0x251, offset 0x4353230, vlsn 9,528,443 HARD_RECOVERY: Rolled back past transaction commit or abort. Must run recovery by re-opening Environment handles Environment is invalid and must be closed. Originally thrown by HA thread: REPLICA 172.21.0.68_9310_1677056472917(1) Originally thrown by HA thread: REPLICA 172.21.0.68_9310_1677056472917(1) Environment invalid because of previous exception: (JE 18.3.12) 172.21.0.68_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb Node 172.21.0.68_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb must rollback 1 total commits(1 of which were durable) to the earliest point indicated by transaction id=-4812378 time=2023-03-05 18:24:01.739 vlsn=9,528,445 lsn=0xfb/0x426da3 durable=false in order to rejoin the replication group. All existing ReplicatedEnvironment handles must be closed and reinstantiated. Log files were truncated to file 0x251, offset 0x4353230, vlsn 9,528,443 HARD_RECOVERY: Rolled back past transaction commit or abort. Must run recovery by re-opening Environment handles Environment is invalid and must be closed. Originally thrown by HA thread: REPLICA 172.21.0.68_9310_1677056472917(1) Originally thrown by HA thread: REPLICA 172.21.0.68_9310_1677056472917(1) at com.sleepycat.je.rep.RollbackException.wrapSelf(RollbackException.java:146) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.RollbackException.wrapSelf(RollbackException.java:62) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1844) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Environment.checkOpen(Environment.java:2697) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Environment.getDatabaseNames(Environment.java:2455) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at org.apache.doris.journal.bdbje.BDBEnvironment.getDatabaseNames(BDBEnvironment.java:323) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.journal.bdbje.BDBJEJournal.getDatabaseNames(BDBJEJournal.java:425) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:245) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.persist.EditLog.getMaxJournalId(EditLog.java:122) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.catalog.Env.getMaxJournalId(Env.java:3552) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.catalog.Env.replayJournal(Env.java:2493) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.catalog.Env$3.runOneCycle(Env.java:2297) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT] Caused by: com.sleepycat.je.rep.RollbackException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.0.68_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb Node 172.21.0.68_9310_1677056472917(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb must rollback 1 total commits(1 of which were durable) to the earliest point indicated by transaction id=-4812378 time=2023-03-05 18:24:01.739 vlsn=9,528,445 lsn=0xfb/0x426da3 durable=false in order to rejoin the replication group. All existing ReplicatedEnvironment handles must be closed and reinstantiated. Log files were truncated to file 0x251, offset 0x4353230, vlsn 9,528,443 HARD_RECOVERY: Rolled back past transaction commit or abort. Must run recovery by re-opening Environment handles Environment is invalid and must be closed. Originally thrown by HA thread: REPLICA 172.21.0.68_9310_1677056472917(1) Originally thrown by HA thread: REPLICA 172.21.0.68_9310_1677056472917(1) at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupHardRecovery(ReplicaFeederSyncup.java:721) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:417) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:164) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:732) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] 2023-03-05 19:26:07,262 ERROR (UNKNOWN 172.21.0.68_9310_1677056472917(-1)|77) [Env$3.runOneCycle():2309] replayer thread catch an exception when replay journal. java.lang.IllegalStateException: Environment is closed. at com.sleepycat.je.Environment.getNonNullEnvImpl(Environment.java:2720) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Environment.checkOpen(Environment.java:2696) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Database.checkEnv(Database.java:2413) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Database.count(Database.java:2039) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at org.apache.doris.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:257) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.persist.EditLog.getMaxJournalId(EditLog.java:122) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.catalog.Env.getMaxJournalId(Env.java:3552) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.catalog.Env.replayJournal(Env.java:2493) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.catalog.Env$3.runOneCycle(Env.java:2297) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT] 2023-03-05 19:26:07,908 INFO (UNKNOWN 172.21.0.68_9310_1677056472917(-1)|1) [Env.waitForReady():895] wait catalog to be ready. FE type: UNKNOWN. is ready: false, counter: 61 2023-03-05 19:26:09,910 INFO (UNKNOWN 172.21.0.68_9310_1677056472917(-1)|1) [Env.waitForReady():895] wait catalog to be ready. FE type: UNKNOWN. is ready: false, counter: 81 2023-03-05 19:26:11,912 INFO (UNKNOWN 172.21.0.68_9310_1677056472917(-1)|1) [Env.waitForReady():895] wait catalog to be ready. FE type: UNKNOWN. is ready: false, counter: 101 2023-03-05 19:26:12,265 INFO (UNKNOWN 172.21.0.68_9310_1677056472917(-1)|77) [Env.replayJournal():2499] replayed journal id is 4711662, replay to journal id is 4717033 2023-03-05 19:26:12,265 WARN (UNKNOWN 172.21.0.68_9310_1677056472917(-1)|77) [BDBJournalCursor.next():147] Catch an exception when get next JournalEntity. key:4711663 java.lang.IllegalStateException: Environment is closed. at com.sleepycat.je.Environment.getNonNullEnvImpl(Environment.java:2720) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Environment.checkOpen(Environment.java:2696) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Database.checkEnv(Database.java:2413) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Database.get(Database.java:1370) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at com.sleepycat.je.Database.get(Database.java:1462) ~[je-18.3.13-doris-SNAPSHOT.jar:18.3.13-doris-SNAPSHOT] at org.apache.doris.journal.bdbje.BDBJournalCursor.next(BDBJournalCursor.java:107) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.catalog.Env.replayJournal(Env.java:2509) ~[doris-fe.jar:1.2-SNAPSHOT] ### What You Expected? https://github.com/apache/doris/pull/6582 this pr cannot work well ### How to Reproduce? https://github.com/apache/doris/pull/6582 3 fe,execute many insert into statement,during the executing, close the 2 follow fe, at last close master fe. then start original 2 follow fe,wait elect the master fe,then start original master fe. then the new follow fe log is larger than the new master fe, need rollback and hard recovery. all call bdbEnvironment.getDatabaseNames will throw error ### Anything Else? _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
