jakubmatyszewski opened a new issue, #15233:
URL: https://github.com/apache/druid/issues/15233

   ### Affected Version
   
   27.0.0
   
   ### Description
   I've tried to update my existing druid instance to use 
`druid-kubernetes-extensions` 
([extension](https://druid.apache.org/docs/latest/development/extensions-core/kubernetes/)),
 but I have realized that this doesn't allow rolling update. In fact it seems 
like it will be generating errors and restarting services as long as there is 
any pod still running without `druid.discovery.type=k8s` enabled. 
   
   I think the problem stems from [this lines of 
code](https://github.com/apache/druid/blob/b95035f183e193f24ceee57cc41d295918fe87ac/extensions-core/kubernetes-extensions/src/main/java/org/apache/druid/k8s/discovery/DefaultK8sApiClient.java#L83-L84)
 triggering exception when druid service pod is detected, but doesn't have 
labels required by this extension. 
   
   What I get in logs of services that I already updated is as follows:
   ```
   2023-10-18T15:47:52,296 ERROR 
[org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatchercoordinator]
 org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - 
Expection while watching for NodeRole [COORDINATOR].
   org.apache.druid.java.util.common.RE: Expection in listing pods, code[0] and 
error[null].
        at 
org.apache.druid.k8s.discovery.DefaultK8sApiClient.listPods(DefaultK8sApiClient.java:94)
 ~[?:?]
        at 
org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher.watch(K8sDruidNodeDiscoveryProvider.java:229)
 ~[?:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]
   Caused by: io.kubernetes.client.openapi.ApiException: 
java.net.SocketTimeoutException: connect timed out
        at io.kubernetes.client.openapi.ApiClient.execute(ApiClient.java:908) 
~[?:?]
        at 
io.kubernetes.client.openapi.apis.CoreV1Api.listNamespacedPodWithHttpInfo(CoreV1Api.java:30930)
 ~[?:?]
        at 
io.kubernetes.client.openapi.apis.CoreV1Api.listNamespacedPod(CoreV1Api.java:30818)
 ~[?:?]
        at 
org.apache.druid.k8s.discovery.DefaultK8sApiClient.listPods(DefaultK8sApiClient.java:83)
 ~[?:?]
        ... 6 more
   Caused by: java.net.SocketTimeoutException: connect timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:?]
        at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412) 
~[?:?]
        at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
 ~[?:?]
        at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237) 
~[?:?]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
        at java.net.Socket.connect(Socket.java:609) ~[?:?]
        at okhttp3.internal.platform.Platform.connectSocket(Platform.kt:128) 
~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.kt:295) 
~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.connection.RealConnection.connect(RealConnection.kt:207) 
~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.kt:226)
 ~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.kt:106)
 ~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.kt:74) 
~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:255) 
~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
 ~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) 
~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95) 
~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) 
~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83) 
~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) 
~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
 ~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) 
~[okhttp-4.9.3.jar:?]
        at 
io.kubernetes.client.util.credentials.TokenFileAuthentication.intercept(TokenFileAuthentication.java:72)
 ~[?:?]
        at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109) 
~[okhttp-4.9.3.jar:?]
        at 
okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:201)
 ~[okhttp-4.9.3.jar:?]
        at okhttp3.internal.connection.RealCall.execute(RealCall.kt:154) 
~[okhttp-4.9.3.jar:?]
        at io.kubernetes.client.openapi.ApiClient.execute(ApiClient.java:904) 
~[?:?]
        at 
io.kubernetes.client.openapi.apis.CoreV1Api.listNamespacedPodWithHttpInfo(CoreV1Api.java:30930)
 ~[?:?]
        at 
io.kubernetes.client.openapi.apis.CoreV1Api.listNamespacedPod(CoreV1Api.java:30818)
 ~[?:?]
        at 
org.apache.druid.k8s.discovery.DefaultK8sApiClient.listPods(DefaultK8sApiClient.java:83)
 ~[?:?]
        ... 6 more
   2023-10-18T15:48:12,120 INFO [main] 
org.apache.druid.discovery.BaseNodeRoleWatcher - Cache for node role 
[coordinator] not initialized yet; getAllNodes() might not return full 
information.
   ```
   
   
   I have tested this extension on test environment with exactly same 
configuration, but with all services started with `druid.discovery.type=k8s` 
and it runs smoothly, so it seems like the only difference that makes it fail 
is what I've described above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to