SWJTU-ZhangLei opened a new issue, #18766: URL: https://github.com/apache/doris/issues/18766
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version root@VM-0-46-ubuntu:/mnt/hdd01/STRESS_ENV/be# ./lib/doris_be --version doris-0.0.0-branch-1.2(AVX2) RELEASE (build git://VM-0-22-ubuntu@62b20b126ff94284b3b9b84a6c2e0e931c157565) Built on Fri, 07 Apr 2023 21:45:48 CST by VM-0-22-ubuntu ### What's Wrong? 1、fe can't not start 2、fe.out `[2023-04-10 10:52:59] notify new FE type transfer: UNKNOWN [2023-04-10 10:52:59] notify new FE type transfer: FOLLOWER [2023-04-10 10:52:59] notify new FE type transfer: UNKNOWN [2023-04-10 10:52:59] notify new FE type transfer: FOLLOWER [2023-04-10 10:52:59] this node is DETACHED java.lang.NullPointerException at com.sleepycat.je.rep.InsufficientLogException.initRepImpl(InsufficientLogException.java:268) at com.sleepycat.je.rep.InsufficientLogException.getRepImpl(InsufficientLogException.java:361) at com.sleepycat.je.rep.NetworkRestore.init(NetworkRestore.java:171) at com.sleepycat.je.rep.NetworkRestore.execute(NetworkRestore.java:281) at org.apache.doris.journal.bdbje.BDBJEJournal.reSetupBdbEnvironment(BDBJEJournal.java:358) at org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:343) at org.apache.doris.persist.EditLog.open(EditLog.java:1038) at org.apache.doris.catalog.Env.initialize(Env.java:863) at org.apache.doris.PaloFe.start(PaloFe.java:138) at org.apache.doris.PaloFe.main(PaloFe.java:73)` 3、fe.log `2023-04-10 10:52:59,166 INFO (UNKNOWN 172.21.0.68_9310_1680101228114(-1)|1) [BDBEnvironment.setup():162] add helper[172.21.0.68:9310] as ReplicationGroupAdmin 2023-04-10 10:52:59,170 WARN (UNKNOWN 172.21.0.68_9310_1680101228114(-1)|1) [Env.notifyNewFETypeTransfer():2373] notify new FE type transfer: UNKNOWN 2023-04-10 10:52:59,189 WARN (RepNode 172.21.0.68_9310_1680101228114(-1)|62) [Env.notifyNewFETypeTransfer():2373] notify new FE type transfer: FOLLOWER 2023-04-10 10:52:59,198 WARN (REPLICA 172.21.0.68_9310_1680101228114(1)|62) [Env.notifyNewFETypeTransfer():2373] notify new FE type transfer: UNKNOWN 2023-04-10 10:52:59,214 WARN (UNKNOWN 172.21.0.68_9310_1680101228114(1)|62) [Env.notifyNewFETypeTransfer():2373] notify new FE type transfer: FOLLOWER 2023-04-10 10:52:59,228 WARN (REPLICA 172.21.0.68_9310_1680101228114(1)|62) [BDBStateChangeListener.stateChange():57] this node is DETACHED 2023-04-10 10:52:59,219 WARN (UNKNOWN 172.21.0.68_9310_1680101228114(-1)|1) [BDBJEJournal.reSetupBdbEnvironment():349] catch insufficient log exception. will recover and try again. com.sleepycat.je.rep.InsufficientLogException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.rep.InsufficientLogException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.0.68_9310_1680101228114(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed.refreshVLSN=null logProviders=null repImpl=null props=null at com.sleepycat.je.rep.InsufficientLogException.wrapSelf(InsufficientLogException.java:340) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:848) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:802) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.log.LogManager.getLogEntryHandleNotFound(LogManager.java:956) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.dbi.DiskOrderedScanner.fetchEntry(DiskOrderedScanner.java:2068) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.dbi.DiskOrderedScanner.fetchAndProcessBINs(DiskOrderedScanner.java:1640) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.dbi.DiskOrderedScanner.scanSerial(DiskOrderedScanner.java:789) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.dbi.DiskOrderedScanner.scan(DiskOrderedScanner.java:708) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.dbi.DatabaseImpl.count(DatabaseImpl.java:1510) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.Database.count(Database.java:2042) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at org.apache.doris.journal.bdbje.BDBJEJournal.getMaxJournalId(BDBJEJournal.java:257) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.journal.bdbje.BDBJEJournal.open(BDBJEJournal.java:339) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.persist.EditLog.open(EditLog.java:1038) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.catalog.Env.initialize(Env.java:863) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.PaloFe.start(PaloFe.java:138) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.PaloFe.main(PaloFe.java:73) ~[doris-fe.jar:1.2-SNAPSHOT] Caused by: com.sleepycat.je.rep.InsufficientLogException: Environment invalid because of previous exception: (JE 18.3.12) 172.21.0.68_9310_1680101228114(1):/mnt/hdd01/STRESS_ENV/fe/doris-meta/bdb INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed. Originally thrown by HA thread: REPLICA 172.21.0.68_9310_1680101228114(1) Originally thrown by HA thread: REPLICA 172.21.0.68_9310_1680101228114(1) at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupLogRefresh(ReplicaFeederSyncup.java:706) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:355) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:164) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:732) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:485) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:412) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT] at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1869) ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]` ### What You Expected? fe can start well ### How to Reproduce? It is hard to reproduced, the followed steps is that we found this problem in our environment. 1、 build a 3 fe and 3 be cluster 2、import and select data continuously throught a follower fe ip 3、sometime, we found the master fe oom 4、after about ten hours, we try to start the master fe, we found it can't start ### Anything Else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
