harinirajendran commented on issue #14375:
URL: https://github.com/apache/druid/issues/14375#issuecomment-1581583521
In our case one of the Middle Managers was in a weird state and the overlord
wasn't able to become the leader because of that
```
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at
org.apache.druid.indexing.overlord.TaskMaster$1.becomeLeader(TaskMaster.java:161)
at
org.apache.druid.curator.discovery.CuratorDruidLeaderSelector$1.isLeader(CuratorDruidLeaderSelector.java:98)
at
org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:702)
at
org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:698)
at
org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.reflect.InvocationTargetException
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at
org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:446)
at
org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:341)
at
org.apache.druid.indexing.overlord.TaskMaster$1.becomeLeader(TaskMaster.java:158)
... 7 more
Caused by: java.lang.RuntimeException: org.apache.druid.java.util.common.RE:
Failed to sync with
worker[druid-op-batch-middlemanagers-0.druid-op-batch-middlemanagers.druid-events-prod.svc.cluster.local:8091].
at
org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner.start(HttpRemoteTaskRunner.java:285)
... 14 more
Caused by: org.apache.druid.java.util.common.RE: Failed to sync with
worker[druid-op-batch-middlemanagers-0.druid-op-batch-middlemanagers.druid-events-prod.svc.cluster.local:8091].
at
org.apache.druid.indexing.overlord.hrtr.WorkerHolder.waitForInitialization(WorkerHolder.java:344)
at
org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner.startWorkersHandling(HttpRemoteTaskRunner.java:560)
at
org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner.start(HttpRemoteTaskRunner.java:265)
... 14 more
```
Restarting that middle manager node helped resolve the issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]