Miuler commented on code in PR #252:
URL:
https://github.com/apache/flink-kubernetes-operator/pull/252#discussion_r889527238
##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/FlinkService.java:
##########
@@ -229,14 +229,12 @@ private JarRunResponseBody runJar(
? RestoreMode.DEFAULT
: null);
LOG.info("Submitting job: {} to session cluster.",
jobID.toHexString());
+ var clientTimeout =
+
configManager.getOperatorConfiguration().getFlinkClientTimeout().toSeconds();
+ LOG.debug("clientTimeout: {}", clientTimeout);
Review Comment:
I have a doubt, what is the properti for this clinetTimeout?
I add this in the values.yaml
```
flink-conf.yaml: |+
# Flink Config Overrides
client.timeout: 4 MINUTE
```
and in my pod I see this
```
exec -ti migration-cosmosdb-wape-5578f9948c-9t6xm -- bash
root@migration-cosmosdb-wape-5578f9948c-9t6xm:/opt/flink# grep time
conf/flink-conf.yaml
client.timeout: 4 MINUTE
```
but in my log I see this
```
2022-06-04 12:20:30,505 o.a.f.k.o.c.FlinkConfigManager [INFO ] Updating
default configuration to {blob.server.port=6124,
taskmanager.memory.process.size=1728m, client.timeout=4 MINUTE,
jobmanager.memory.process.size=1600m, jobmanager.rpc.port=6123,
taskmanager.rpc.port=6122, queryable-state.proxy.ports=6125, paralle
lism.default=2, taskmanager.numberOfTaskSlots=2,
kubernetes.operator.metrics.reporter.slf4j.interval=5 MINUTE,
kubernetes.operator.observer.progress-check.interval=5 s,
kubernetes.operator.metrics.reporter.slf4j.factory.class=org.apache.flink.metrics.slf4j.Slf4jReporterFactory,
kubernetes.operator.reconciler.resched
ule.interval=15 s}
...
...
2022-06-04 12:20:30,762 i.j.o.a.c.ExecutorServiceManager [DEBUG] Initialized
ExecutorServiceManager executor: class java.util.concurrent.ThreadPoolExecutor,
timeout: 10
...
...
2022-06-04 12:28:12,124 o.a.f.k.o.s.FlinkService
[DEBUG][flink-wape-02/migration-cosmosdb-wape-sessionjob] clientTimeout: 10
...
...
2022-06-04 12:26:52,685 i.j.o.p.e.ReconciliationDispatcher
[ERROR][flink-wape-02/migration-cosmosdb-wape-sessionjob] Error during event
processing ExecutionScope{ resource id:
CustomResourceID{name='migration-cosmosdb-wape-sessionjob',
namespace='flink-wape-02'}, version: null} failed.
org.apache.flink.kubernetes.operator.exception.ReconciliationException:
org.apache.flink.util.FlinkRuntimeException:
java.util.concurrent.TimeoutException
at
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:117)
at
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:59)
at
io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:101)
at
io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:76)
at
io.javaoperatorsdk.operator.api.monitoring.Metrics.timeControllerExecution(Metrics.java:34)
at
io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:75)
at
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:143)
at
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:109)
at
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:74)
at
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:50)
at
io.javaoperatorsdk.operator.processing.event.EventProcessor$ControllerExecution.run(EventProcessor.java:349)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.flink.util.FlinkRuntimeException:
java.util.concurrent.TimeoutException
at
org.apache.flink.kubernetes.operator.service.FlinkService.runJar(FlinkService.java:240)
at
org.apache.flink.kubernetes.operator.service.FlinkService.submitJobToSessionCluster(FlinkService.java:198)
at
org.apache.flink.kubernetes.operator.reconciler.sessionjob.FlinkSessionJobReconciler.submitAndInitStatus(FlinkSessionJobReconciler.java:164)
at
org.apache.flink.kubernetes.operator.reconciler.sessionjob.FlinkSessionJobReconciler.reconcile(FlinkSessionJobReconciler.java:88)
at
org.apache.flink.kubernetes.operator.reconciler.sessionjob.FlinkSessionJobReconciler.reconcile(FlinkSessionJobReconciler.java:48)
at
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:115)
... 13 more
Caused by: java.util.concurrent.TimeoutException
at java.base/java.util.concurrent.CompletableFuture.timedGet(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)
at
org.apache.flink.kubernetes.operator.service.FlinkService.runJar(FlinkService.java:237)
... 18 more
```
I see my `client.timeout=4 MINUTE` but also a `clientTimeout 10` seconds?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]