This is an automated email from the ASF dual-hosted git repository.
gavinchou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new d9eb14a96bb [fix](cloud) Fix cloud decomission lead to fe cant start
(#46783)
d9eb14a96bb is described below
commit d9eb14a96bbe0cda1a6335c67bf3e9c4ef30513b
Author: deardeng <[email protected]>
AuthorDate: Mon Jan 13 11:10:04 2025 +0800
[fix](cloud) Fix cloud decomission lead to fe cant start (#46783)
Fix issue with SQL node decommissioning process
The SQL node decommissioning process does not wait for transactions at
the watermark level to complete before setting the backend's
isDecommissioned status to true.
As a result, the value displayed in show backends immediately reflects
isDecommissioned regardless of ongoing transactions initiated via SQL.
When a user calls drop be to remove a backend while there is only one
backend in the cluster, the edit log logs the drop backend action, which
removes the cluster information from memory.
After dropping the backend, the previous transaction watermark process
completes its tasks and attempts to modify the backend status, which
requires accessing the cluster information. However, since the cluster
information has already been deleted, this results in a null pointer
exception (NPE) during the lookup in the FE memory map, causing the FE
to crash.
Additionally, the sequence of edit logs is fixed as follows:
Edit log logs drop backend
Edit log modifies backend
FE fails to start up
```
2025-01-10 05:46:15,070 ERROR (replayer|15) [EditLog.loadJournal():1251]
replay Operation Type 91, log id: 10578
java.lang.NullPointerException: Cannot invoke
"org.apache.doris.system.Backend.getCloudClusterName()" because "memBe" is null
at
org.apache.doris.cloud.system.CloudSystemInfoService.replayModifyBackend(CloudSystemInfoService.java:461)
~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:432)
~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.catalog.Env.replayJournal(Env.java:2999)
~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.catalog.Env$4.runOneCycle(Env.java:2761)
~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.Daemon.run(Daemon.java:119)
~[doris-fe.jar:1.2-SNAPSHOT]
```
---
.../java/org/apache/doris/cloud/system/CloudSystemInfoService.java | 3 +++
1 file changed, 3 insertions(+)
diff --git
a/fe/fe-core/src/main/java/org/apache/doris/cloud/system/CloudSystemInfoService.java
b/fe/fe-core/src/main/java/org/apache/doris/cloud/system/CloudSystemInfoService.java
index 36ca260dc17..71260c51f23 100644
---
a/fe/fe-core/src/main/java/org/apache/doris/cloud/system/CloudSystemInfoService.java
+++
b/fe/fe-core/src/main/java/org/apache/doris/cloud/system/CloudSystemInfoService.java
@@ -457,6 +457,9 @@ public class CloudSystemInfoService extends
SystemInfoService {
@Override
public void replayModifyBackend(Backend backend) {
Backend memBe = getBackend(backend.getId());
+ if (memBe == null) {
+ return;
+ }
// for rename cluster
String originalClusterName = memBe.getCloudClusterName();
String originalClusterId = memBe.getCloudClusterId();
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]