AlexLWei opened a new issue, #27709:
URL: https://github.com/apache/doris/issues/27709

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   1.2.7 -升级至->2.0.2
   BE 已完成升级,目前卡在Fe升级中
   
   ### What's Wrong?
   
   fe升级流程为:
   关闭所有fe/be节点
   复制doris-meta元数据至新版Fe节点中,启动新版Fe。然后全Alter system drop掉所有其他Fe再重新添加。
   Drop Fe 节点没有问题,ADD FOLLOWER时后台重试三次爆出日志:fe.log:
   ```
   2023-11-28 09:49:56,110 ERROR (mysql-nio-pool-0|328) 
[BDBJEJournal.write():180] catch an exception when writing to database. sleep 
and retry. journal id 155010628
   com.sleepycat.je.rep.InsufficientReplicasException: (JE 18.3.12) Commit 
policy: SIMPLE_MAJORITY required 1 replica. But none were active with this 
master.
       at 
com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureReplicasForCommit(DurabilityQuorum.java:116)
 ~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.rep.impl.RepImpl.txnBeginHook(RepImpl.java:1171) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.rep.txn.MasterTxn.txnBeginHook(MasterTxn.java:195) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.txn.Txn.initTxn(Txn.java:384) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.txn.Txn.<init>(Txn.java:288) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.txn.Txn.<init>(Txn.java:267) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.rep.txn.MasterTxn.<init>(MasterTxn.java:146) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.rep.txn.MasterTxn$1.create(MasterTxn.java:117) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.rep.txn.MasterTxn.create(MasterTxn.java:435) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.rep.impl.RepImpl.createRepUserTxn(RepImpl.java:1145) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.txn.Txn.createAutoTxn(Txn.java:334) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at 
com.sleepycat.je.txn.LockerFactory.getWritableLocker(LockerFactory.java:79) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at 
com.sleepycat.je.txn.LockerFactory.getWritableLocker(LockerFactory.java:40) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.Database.put(Database.java:1625) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at com.sleepycat.je.Database.put(Database.java:1688) 
~[je-18.3.14-doris-SNAPSHOT.jar:18.3.14-doris-SNAPSHOT]
       at 
org.apache.doris.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:151) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at org.apache.doris.persist.EditLog.logEdit(EditLog.java:1143) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at org.apache.doris.persist.EditLog.logAddFrontend(EditLog.java:1335) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at org.apache.doris.catalog.Env.addFrontend(Env.java:2590) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at org.apache.doris.alter.SystemHandler.process(SystemHandler.java:153) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at org.apache.doris.alter.AlterHandler.process(AlterHandler.java:185) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at org.apache.doris.alter.Alter.processAlterCluster(Alter.java:736) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at org.apache.doris.catalog.Env.alterCluster(Env.java:4681) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at org.apache.doris.qe.DdlExecutor.execute(DdlExecutor.java:207) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at 
org.apache.doris.qe.StmtExecutor.handleDdlStmt(StmtExecutor.java:2184) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at 
org.apache.doris.qe.StmtExecutor.executeByLegacy(StmtExecutor.java:749) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:451) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:422) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at 
org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:435) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at 
org.apache.doris.qe.ConnectProcessor.dispatch(ConnectProcessor.java:583) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at 
org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:834) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at 
org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) 
~[doris-fe.jar:1.2-SNAPSHOT]
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_292]
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_292]
       at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
   
    .....
   
   2023-11-28 09:50:01,111 ERROR (mysql-nio-pool-0|328) 
[BDBJEJournal.write():203] write bdb failed. will exit. journalId: 155010628, 
bdb database Name: 155010580
   ```
   fe.out:
   ```
   [2023-11-28 09:50:01] write bdb failed. will exit. journalId: 155010628, bdb 
database Name: 155010580
   ```
   
   然后Fe 挂掉。
   
   但是测试ADD OBSERVER不会受到影响。
   目前发现回到原来环境ADD FOLLOWER也会出现上述问题,只是下列日志会变为
   com.sleepycat.je.rep.InsufficientReplicasException: (JE 18.3.12) Commit 
policy: SIMPLE_MAJORITY required 3 replica. But none were 2 active with this 
master (ip1 ip2).
   其中集群的FE FOLLOWER 为3个,且上述的两个IP为非MASTER IP 猜测是执行该命令时master直接挂了导致。
   
   疑似可能是上一次升级(1.1.5 ——> 1.2.7)时元数据恢复使用 metadata_failure_recovery 操作不当 
导致,但是对正常数据处理等使用不影响。
   
   
   ### What You Expected?
   
   该如何、从哪方面下手处理这个问题?
   目前升级需要将Fe迁移至新的集群,所以急需解决这个问题。
   
   ### How to Reproduce?
   
   _No response_
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to