ksharma-qc opened a new issue, #16518: URL: https://github.com/apache/druid/issues/16518
I have a Druid cluster running on Kubernetes. I have allowed port 8100 through the data-service pod and service. When running an ingestion task I see this error. However the task runs and ingests the data.   ### Affected Version 29.0.1 ### Description In the coordinator logs I see this error. Which implies that it cannot reach data service on port 8100. ``` [0] coordinator-overlord.log: [[1717080225.505805896, {}], {"log"=>"2024-05-30T14:43:45,505 WARN [qtp1421004802-150] org.apache.druid.indexing.overlord.http.OverlordResource - Failed to stream task reports for task query-9e607dc2-55e8-4b6 1-9a6a-1953fb8f914e"}] ... [87] coordinator-overlord.log: [[1717080225.505858796, {}], {"log"=>" at java.lang.Thread.run(Thread.java:842) ~[?:?]"}] [88] coordinator-overlord.log: [[1717080225.505859347, {}], {"log"=>"Caused by: java.net.ConnectException: Connection refused: druid-data-0.data-pods/10.244.3.10:8100"}] ``` The port is indeed open and the Peon service is listening on it during task execution which I've verifed by running this: ```shell while true; do nc -vz druid-data-0.data-pods 8100 sleep 1 done ``` Which prints: ```` # Before running ingestion task nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: Connection refused nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: Connection refused nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: Connection refused nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: Connection refused # During the task Connection to druid-data-0.data-pods (10.244.3.14) 8100 port [tcp/*] succeeded! Connection to druid-data-0.data-pods (10.244.3.14) 8100 port [tcp/*] succeeded! Connection to druid-data-0.data-pods (10.244.3.14) 8100 port [tcp/*] succeeded! # After the task nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: Connection refused nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: Connection refused nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: Connection refused ```` To be honest I'm confused as to why Coordinator is even trying to reach the Peon for task logs. Given that Peons are ephemeral and exit once the task is done, it would make much more sense to get those logs from the Middle Manager whose job is to manage the Peons. What am I missing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
