harinirajendran commented on issue #14375:
URL: https://github.com/apache/druid/issues/14375#issuecomment-1581583521

   In our case one of the Middle Managers was in a weird state and the overlord 
wasn't able to become the leader because of that
   ```
   java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at 
org.apache.druid.indexing.overlord.TaskMaster$1.becomeLeader(TaskMaster.java:161)
        at 
org.apache.druid.curator.discovery.CuratorDruidLeaderSelector$1.isLeader(CuratorDruidLeaderSelector.java:98)
        at 
org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:702)
        at 
org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:698)
        at 
org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
   Caused by: java.lang.reflect.InvocationTargetException
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at 
org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:446)
        at 
org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:341)
        at 
org.apache.druid.indexing.overlord.TaskMaster$1.becomeLeader(TaskMaster.java:158)
        ... 7 more
   Caused by: java.lang.RuntimeException: org.apache.druid.java.util.common.RE: 
Failed to sync with 
worker[druid-op-batch-middlemanagers-0.druid-op-batch-middlemanagers.druid-events-prod.svc.cluster.local:8091].
        at 
org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner.start(HttpRemoteTaskRunner.java:285)
        ... 14 more
   Caused by: org.apache.druid.java.util.common.RE: Failed to sync with 
worker[druid-op-batch-middlemanagers-0.druid-op-batch-middlemanagers.druid-events-prod.svc.cluster.local:8091].
        at 
org.apache.druid.indexing.overlord.hrtr.WorkerHolder.waitForInitialization(WorkerHolder.java:344)
        at 
org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner.startWorkersHandling(HttpRemoteTaskRunner.java:560)
        at 
org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner.start(HttpRemoteTaskRunner.java:265)
        ... 14 more
   ```
   Restarting that middle manager node helped resolve the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to