anilkumaryadavalli opened a new issue, #15395:
URL: https://github.com/apache/druid/issues/15395

   ### Affected Version
   We are running Druid 26 version with zookeeper less discovery (without 
Zookeeper) in GKE kubernetes cluster.
   
   ### Description
   
   Index_kafka supervisor removing one middle manager for each index_kafka task 
and throwing below logs from middle manager 
   the tasks which killed manually also removing one middle manager.
   
   - Configurations in use
   
   - Steps to reproduce the problem
   1.In GKE (Kubernetes) we have 300 pods in total, middle manager count shows 
292(we already ran 8 index_kafka tasks which removed 5 middle managers)
   
![image](https://github.com/apache/druid/assets/92722889/d634fd9e-a099-4f64-bb45-9298303b8f65)
   2.Run index_kafka task, peon will get created to run the task
   
![image](https://github.com/apache/druid/assets/92722889/45f096a8-631a-41b3-98ca-407b3b19f922)
   3.once the task is killed manually or completed, middle manager will 
disappear.
   
![image](https://github.com/apache/druid/assets/92722889/2efae7b7-bc9a-476c-8c69-089ceb50a998)
    Note: GKE Kubernetes middle manger pod count doesn't reduce
    
   - The error message or stack traces encountered. Providing more context, 
such as nearby log messages or even entire logs, can be helpful.
   2023-11-07T02:14:22,936 DEBUG [HttpClient-Netty-Worker-18] 
org.apache.druid.java.util.http.client.NettyHttpClient - [POST 
http://x.x.x.x:8100/druid/worker/v1/chat/index_kafka_test_8d68cb20ddbdbc0_bilipega/offsets/end?finish=true]
 Got chunk: 0B, last=true
   2023-11-07T02:14:22,936 DEBUG [ServiceClientFactory-2] 
org.apache.druid.rpc.ServiceClientImpl - Service 
[index_kafka_test_8d68cb20ddbdbc0_bilipega] request [POST 
http://x.x.x.x:8100/druid/worker/v1/chat/index_kafka_test_8d68cb20ddbdbc0_bilipega/offsets/end?finish=true]
 completed.
   2023-11-07T02:14:23,015 INFO 
[org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatchermiddleManager]
 org.apache.druid.discovery.BaseNodeRoleWatcher 
   - Node [http://x.x.x.x:8088/] of role [middleManager] went offline.
   2023-11-07T02:14:23,015 INFO 
[K8sDruidNodeDiscoveryProvider-ListenerExecutor] 
org.apache.druid.indexing.overlord.hrtr.HttpRemoteTaskRunner - Kaboom! 
Worker[x.x.x.x:8088] removed!
   2023-11-07T02:14:23,015 INFO 
[K8sDruidNodeDiscoveryProvider-ListenerExecutor] 
org.apache.druid.server.coordination.ChangeRequestHttpSyncer - Stopping 
ChangeRequestHttpSyncer[http://x.x.x.x:8088/_1698949782254].
   2023-11-07T02:14:23,015 INFO 
[K8sDruidNodeDiscoveryProvider-ListenerExecutor] 
org.apache.druid.server.coordination.ChangeRequestHttpSyncer - Stopped 
ChangeRequestHttpSyncer[http://x.x.x.x:8088/_1698949782254].
   2023-11-07T02:14:32,593 DEBUG [ServiceClientFactory-2]
   
   - Any debugging that you have already done
   1.We tried upgrading to Druid 27 version from 26 , didn't fix the issue.
   2.Tried with Druid 28 version, didn't fix the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to