Sebastian Struß created FLINK-37895:
---------------------------------------

             Summary: "Failed to fetch job exceptions from REST API for jobId" 
errors for session jobs
                 Key: FLINK-37895
                 URL: https://issues.apache.org/jira/browse/FLINK-37895
             Project: Flink
          Issue Type: Bug
         Environment: K8s 1.32 on arm64 nodes

Flink kubernetes operator 1.12.0

Flink 1.19.1
            Reporter: Sebastian Struß


Flink kubernetes operator in version 1.12.0 has started to print out error 
messages like this:

```

{"timeMillis":1749046856604,"thread":"ReconcilerExecutor-flinksessionjobcontroller-84","level":"WARN","loggerName":"org.apache.flink.kubernetes.operator.service.AbstractFlinkService","message":"Failed
 to fetch job exceptions from REST API for jobId 
56bdbb2095a14bb40d154cf0a3ba4659","thrown":\{"commonElementCount":0,"localizedMessage":"java.net.UnknownHostException:
 parquetizer-xyz-rest.parquetizers: Name or service not 
known","message":"java.net.UnknownHostException: 
parquetizer-xyz-rest.parquetizers: Name or service not 
known","name":"java.util.concurrent.ExecutionException","cause":{"commonElementCount":1,"localizedMessage":"parquetizer-xyz-rest.parquetizers:
 Name or service not known","message":"parquetizer-xyz-rest.parquetizers: Name 
or service not 
known","name":"java.net.UnknownHostException","extendedStackTrace":"java.net.UnknownHostException:
 parquetizer-xyz-rest.parquetizers: Name or service not known\n\tat 
java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) ~[?:?]\n\tat 
java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source) 
~[?:?]\n\tat java.net.InetAddress.getAddressesFromNameService(Unknown Source) 
~[?:?]\n\tat java.net.InetAddress$NameServiceAddresses.get(Unknown Source) 
~[?:?]\n\tat java.net.InetAddress.getAllByName0(Unknown Source) ~[?:?]\n\tat 
java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat 
java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat 
java.net.InetAddress.getByName(Unknown Source) ~[?:?]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
java.security.AccessController.doPrivileged(Native Method) ~[?:?]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 
~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n"},"extendedStackTrace":"java.util.concurrent.ExecutionException:
 java.net.UnknownHostException: parquetizer-xyz-rest.parquetizers: Name or 
service not known\n\tat 
java.util.concurrent.CompletableFuture.reportGet(Unknown Source) ~[?:?]\n\tat 
java.util.concurrent.CompletableFuture.get(Unknown Source) ~[?:?]\n\tat 
org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getJobExceptions(AbstractFlinkService.java:873)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.kubernetes.operator.observer.JobStatusObserver.observeJobManagerExceptions(JobStatusObserver.java:131)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.kubernetes.operator.observer.JobStatusObserver.observe(JobStatusObserver.java:97)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.kubernetes.operator.observer.sessionjob.FlinkSessionJobObserver.observeInternal(FlinkSessionJobObserver.java:54)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:49)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:113)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:59)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:153)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:111)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:110)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:136)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:117)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:452)
 [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]\n\tat 
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]\n\tat 
java.lang.Thread.run(Unknown Source) [?:?]\nCaused by: 
java.net.UnknownHostException: parquetizer-xyz-rest.parquetizers: Name or 
service not known\n\tat java.net.Inet6AddressImpl.lookupAllHostAddr(Native 
Method) ~[?:?]\n\tat 
java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source) 
~[?:?]\n\tat java.net.InetAddress.getAddressesFromNameService(Unknown Source) 
~[?:?]\n\tat java.net.InetAddress$NameServiceAddresses.get(Unknown Source) 
~[?:?]\n\tat java.net.InetAddress.getAllByName0(Unknown Source) ~[?:?]\n\tat 
java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat 
java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat 
java.net.InetAddress.getByName(Unknown Source) ~[?:?]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
java.security.AccessController.doPrivileged(Native Method) ~[?:?]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat 
org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
 ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\t... 1 
more\n"},"endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":\{"resource.apiVersion":"flink.apache.org/v1beta1","resource.generation":"2","resource.kind":"FlinkSessionJob","resource.name":"parquetizer-xyz","resource.namespace":"parquetizers","resource.resourceVersion":"1904237678","resource.uid":"736550c3-dc52-4a0b-8124-f873d02f5d53"},"threadId":84,"threadPriority":5},

```

 

We didn't see those with flink-kubernetes-operator 1.11.0.

 

It seems that the operator tries to reach a service based on the jobs name 
inside the cluster.

Since I am using a session cluster here, it should be reaching out to it and 
query for exception logs - am I wrong?

The service however doesn't exist (and never did), hence the error message.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to