This is an automated email from the ASF dual-hosted git repository.
chengpan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-celeborn.git
The following commit(s) were added to refs/heads/main by this push:
new 16bf2aeea [CELEBORN-1013] Shutdown master if initialized failed
16bf2aeea is described below
commit 16bf2aeeaa1767db5f6b676818ce4c9062ed2608
Author: sychen <[email protected]>
AuthorDate: Thu Sep 28 19:02:59 2023 +0800
[CELEBORN-1013] Shutdown master if initialized failed
### What changes were proposed in this pull request?
```java
23/09/28 14:48:12,512 ERROR [main] Master: Initialize master failed.
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:461)
at sun.nio.ch.Net.bind(Net.java:453)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:222)
at
io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:141)
at
io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:562)
at
io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1334)
```
### Why are the changes needed?
For example, bind's http service
port(`celeborn.metrics.master.prometheus.port`) port is occupied and master
startup fails, but because the thread started by Raft is not a daemon, the
master process still exists.
https://github.com/apache/ratis/blob/d461a01a53e7e130f0ec4143e75b316012137b62/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L283-L290
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
Closes #1945 from cxzl25/CELEBORN-1013.
Authored-by: sychen <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
---
.../org/apache/celeborn/service/deploy/master/Master.scala | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git
a/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala
b/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala
index f19213556..1bf92b049 100644
---
a/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala
+++
b/master/src/main/scala/org/apache/celeborn/service/deploy/master/Master.scala
@@ -961,7 +961,13 @@ private[deploy] object Master extends Logging {
def main(args: Array[String]): Unit = {
val conf = new CelebornConf()
val masterArgs = new MasterArguments(args, conf)
- val master = new Master(conf, masterArgs)
- master.initialize()
+ try {
+ val master = new Master(conf, masterArgs)
+ master.initialize()
+ } catch {
+ case e: Throwable =>
+ logError("Initialize master failed.", e)
+ System.exit(-1)
+ }
}
}