ksharma-qc opened a new issue, #16518:
URL: https://github.com/apache/druid/issues/16518

   I have a Druid cluster running on Kubernetes. I have allowed port 8100 
through the data-service pod and service.
   
   When running an ingestion task I see this error. 
   
   However the task runs and ingests the data.
   
   
![image](https://github.com/apache/druid/assets/102537335/6cee0e1d-de92-403d-b8d1-7b787262eaa6)
   
   
![image](https://github.com/apache/druid/assets/102537335/0cd08629-3595-474e-ac02-de5d2b337f2e)
   
   ### Affected Version
   
   29.0.1
   
   ### Description
   In the coordinator logs I see this error. Which implies that it cannot reach 
data service on port 8100.
   
   ```
   [0] coordinator-overlord.log: [[1717080225.505805896, {}], 
{"log"=>"2024-05-30T14:43:45,505 WARN [qtp1421004802-150] 
org.apache.druid.indexing.overlord.http.OverlordResource - Failed to stream 
task reports for task query-9e607dc2-55e8-4b6
   1-9a6a-1953fb8f914e"}]
   ...
   [87] coordinator-overlord.log: [[1717080225.505858796, {}], {"log"=>"   at 
java.lang.Thread.run(Thread.java:842) ~[?:?]"}]
   [88] coordinator-overlord.log: [[1717080225.505859347, {}], {"log"=>"Caused 
by: java.net.ConnectException: Connection refused: 
druid-data-0.data-pods/10.244.3.10:8100"}]
   ```
   
   The port is indeed open and the Peon service is listening on it during task 
execution which I've verifed by running this:
   ```shell
   while true; do
     nc -vz druid-data-0.data-pods 8100
     sleep 1
   done
   ```
   
   Which prints:
   ````
   # Before running ingestion task
   nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: 
Connection refused
   nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: 
Connection refused
   nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: 
Connection refused
   nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: 
Connection refused
   
   # During the task
   Connection to druid-data-0.data-pods (10.244.3.14) 8100 port [tcp/*] 
succeeded!
   Connection to druid-data-0.data-pods (10.244.3.14) 8100 port [tcp/*] 
succeeded!
   Connection to druid-data-0.data-pods (10.244.3.14) 8100 port [tcp/*] 
succeeded!
   
   # After the task
   nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: 
Connection refused
   nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: 
Connection refused
   nc: connect to druid-data-0.data-pods (10.244.3.14) port 8100 (tcp) failed: 
Connection refused
   ````
   
   To be honest I'm confused as to why Coordinator is even trying to reach the 
Peon for task logs. Given that Peons are ephemeral and exit once the task is 
done, it would make much more sense to get those logs from the Middle Manager 
whose job is to manage the Peons.
   
   What am I missing?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to