Sebastian Struß created FLINK-37895: ---------------------------------------
Summary: "Failed to fetch job exceptions from REST API for jobId" errors for session jobs Key: FLINK-37895 URL: https://issues.apache.org/jira/browse/FLINK-37895 Project: Flink Issue Type: Bug Environment: K8s 1.32 on arm64 nodes Flink kubernetes operator 1.12.0 Flink 1.19.1 Reporter: Sebastian Struß Flink kubernetes operator in version 1.12.0 has started to print out error messages like this: ``` {"timeMillis":1749046856604,"thread":"ReconcilerExecutor-flinksessionjobcontroller-84","level":"WARN","loggerName":"org.apache.flink.kubernetes.operator.service.AbstractFlinkService","message":"Failed to fetch job exceptions from REST API for jobId 56bdbb2095a14bb40d154cf0a3ba4659","thrown":\{"commonElementCount":0,"localizedMessage":"java.net.UnknownHostException: parquetizer-xyz-rest.parquetizers: Name or service not known","message":"java.net.UnknownHostException: parquetizer-xyz-rest.parquetizers: Name or service not known","name":"java.util.concurrent.ExecutionException","cause":{"commonElementCount":1,"localizedMessage":"parquetizer-xyz-rest.parquetizers: Name or service not known","message":"parquetizer-xyz-rest.parquetizers: Name or service not known","name":"java.net.UnknownHostException","extendedStackTrace":"java.net.UnknownHostException: parquetizer-xyz-rest.parquetizers: Name or service not known\n\tat java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) ~[?:?]\n\tat java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source) ~[?:?]\n\tat java.net.InetAddress.getAddressesFromNameService(Unknown Source) ~[?:?]\n\tat java.net.InetAddress$NameServiceAddresses.get(Unknown Source) ~[?:?]\n\tat java.net.InetAddress.getAllByName0(Unknown Source) ~[?:?]\n\tat java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat java.net.InetAddress.getByName(Unknown Source) ~[?:?]\n\tat org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat java.security.AccessController.doPrivileged(Native Method) ~[?:?]\n\tat org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n"},"extendedStackTrace":"java.util.concurrent.ExecutionException: java.net.UnknownHostException: parquetizer-xyz-rest.parquetizers: Name or service not known\n\tat java.util.concurrent.CompletableFuture.reportGet(Unknown Source) ~[?:?]\n\tat java.util.concurrent.CompletableFuture.get(Unknown Source) ~[?:?]\n\tat org.apache.flink.kubernetes.operator.service.AbstractFlinkService.getJobExceptions(AbstractFlinkService.java:873) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.kubernetes.operator.observer.JobStatusObserver.observeJobManagerExceptions(JobStatusObserver.java:131) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.kubernetes.operator.observer.JobStatusObserver.observe(JobStatusObserver.java:97) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.kubernetes.operator.observer.sessionjob.FlinkSessionJobObserver.observeInternal(FlinkSessionJobObserver.java:54) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.kubernetes.operator.observer.AbstractFlinkResourceObserver.observe(AbstractFlinkResourceObserver.java:49) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:113) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:59) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:153) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat io.javaoperatorsdk.operator.processing.Controller$1.execute(Controller.java:111) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics.timeControllerExecution(OperatorJosdkMetrics.java:80) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:110) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:136) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:117) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:91) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:64) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:452) [flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]\n\tat java.lang.Thread.run(Unknown Source) [?:?]\nCaused by: java.net.UnknownHostException: parquetizer-xyz-rest.parquetizers: Name or service not known\n\tat java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) ~[?:?]\n\tat java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source) ~[?:?]\n\tat java.net.InetAddress.getAddressesFromNameService(Unknown Source) ~[?:?]\n\tat java.net.InetAddress$NameServiceAddresses.get(Unknown Source) ~[?:?]\n\tat java.net.InetAddress.getAllByName0(Unknown Source) ~[?:?]\n\tat java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat java.net.InetAddress.getAllByName(Unknown Source) ~[?:?]\n\tat java.net.InetAddress.getByName(Unknown Source) ~[?:?]\n\tat org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat java.security.AccessController.doPrivileged(Native Method) ~[?:?]\n\tat org.apache.flink.shaded.netty4.io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:106) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:206) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:46) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:180) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:166) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:625) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:105) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:990) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:516) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:429) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:486) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\tat org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[flink-kubernetes-operator-1.12.0-shaded.jar:1.12.0]\n\t... 1 more\n"},"endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":\{"resource.apiVersion":"flink.apache.org/v1beta1","resource.generation":"2","resource.kind":"FlinkSessionJob","resource.name":"parquetizer-xyz","resource.namespace":"parquetizers","resource.resourceVersion":"1904237678","resource.uid":"736550c3-dc52-4a0b-8124-f873d02f5d53"},"threadId":84,"threadPriority":5}, ``` We didn't see those with flink-kubernetes-operator 1.11.0. It seems that the operator tries to reach a service based on the jobs name inside the cluster. Since I am using a session cluster here, it should be reaching out to it and query for exception logs - am I wrong? The service however doesn't exist (and never did), hence the error message. -- This message was sent by Atlassian Jira (v8.20.10#820010)