Bhagi created FLINK-22938:
-----------------------------
Summary: Slot request bulk is not fulfillable! Could not allocate
the required slot within slot request timeout
Key: FLINK-22938
URL: https://issues.apache.org/jira/browse/FLINK-22938
Project: Flink
Issue Type: Bug
Components: Deployment / Kubernetes
Affects Versions: 1.12.4
Reporter: Bhagi
Hi team,
I tested cluster upgrade from Flink Version 1.12.4 to 1.13.1 ,due to 1 job
issues latest version cluster went into crashloopbackoff with error. hence i
degraded to old cluster version. from latest upgraded version 1.13.1 to 1.12.4
it was successful. But all job executions are failed state.
with following error."Slot request bulk is not fulfillable! Could not allocate
the required slot within slot request ".please find the log.
############# FLink config file ######################
flink@flink-jobmanager-657cb5d847-5b579:~$ cat conf/flink-conf.yaml
taskmanager.numberOfTaskSlots: 2
jobmanager.rpc.address: flink-jobmanager
blob.server.port: 6124
jobmanager.rpc.port: 6123
taskmanager.rpc.port: 6122
queryable-state.proxy.ports: 6125
jobmanager.memory.process.size: 1600m
taskmanager.memory.process.size: 1728m
parallelism.default: 2
rest.connection-timeout: 25000
web.log.path: /opt/flink/log/output.log
taskmanager.log.path: /opt/flink/log/output.log
state.backend: rocksdb
state.checkpoints.dir: file:///persistent/flinkData/checkpoints
state.backend.rocksdb.log.dir: /persistent/flinkData/rocksdb/logging/
state.savepoints.dir: file:///persistent/flinkData/savepoints
state.backend.incremental: true
state.checkpoints.num-retained: 1
web.upload.dir: /persistent/flinkData
classloader.resolve-order: parent-first
kubernetes.cluster-id: 222
kubernetes.namespace: flink-mcd
high-availability:
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: file:///persistent/flinkData/checkpoints
jobmanager.archive.fs.dir: file:///persistent/flinkData/completed-jobs
historyserver.archive.fs.refresh-interval: 10000
historyserver.archive.fs.dir: file:///persistent/flinkData/completed-jobs
metrics.reporters: prom
metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9090
akka.framesize: 10485760b
flink@flink-jobmanager-657cb5d847-5b579:~$
############Logs from job manager######################
flink@flink-jobmanager-6d644dc78b-6r627:~$ ./bin/flink run
./examples/streaming/WordCount.jar
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.flink.api.java.ClosureCleaner
(file:/opt/flink/lib/flink-dist_2.11-1.12.2.jar) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of
org.apache.flink.api.java.ClosureCleaner
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
2021-06-09 04:59:40,098 INFO
org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver []
- Stopping
KubernetesLeaderRetrievalDriver{configMapName='111-restserver-leader'}.
2021-06-09 04:59:40,099 INFO
org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapWatcher []
- The watcher is closing.
Job has been submitted with JobID d36a0b99601dc6af696d213da2f8c159
2021-06-09 05:04:37,877 INFO
org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver []
- Stopping
KubernetesLeaderRetrievalDriver{configMapName='111-restserver-leader'}.
2021-06-09 05:04:37,879 INFO
org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapWatcher []
- The watcher is closing.
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error: org.apache.flink.client.program.ProgramInvocationException:
Job failed (JobID: d36a0b99601dc6af696d213da2f8c159)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:366)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:219)
at
org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
at
org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: java.util.concurrent.ExecutionException:
org.apache.flink.client.program.ProgramInvocationException: Job failed (JobID:
d36a0b99601dc6af696d213da2f8c159)
at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)
at
org.apache.flink.client.program.StreamContextEnvironment.getJobExecutionResult(StreamContextEnvironment.java:123)
at
org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:80)
at
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1782)
at
org.apache.flink.streaming.examples.wordcount.WordCount.main(WordCount.java:97)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:349)
... 8 more
Caused by: org.apache.flink.client.program.ProgramInvocationException: Job
failed (JobID: d36a0b99601dc6af696d213da2f8c159)
at
org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:125)
at
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.complete(Unknown
Source)
at
org.apache.flink.client.program.rest.RestClusterClient.lambda$pollResourceAsync$22(RestClusterClient.java:665)
at
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.complete(Unknown
Source)
at
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:394)
at
java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postFire(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture$Completion.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution
failed.
at
org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
at
org.apache.flink.client.deployment.ClusterClientJobClientAdapter.lambda$null$6(ClusterClientJobClientAdapter.java:123)
... 18 more
Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by
NoRestartBackoffTimeStrategy
at
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:118)
at
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:80)
at
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:233)
at
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:224)
at
org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:215)
at
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:669)
at
org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:56)
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(ExecutionGraph.java:1869)
at
org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1437)
at
org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1377)
at
org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:1205)
at
org.apache.flink.runtime.executiongraph.ExecutionVertex.markFailed(ExecutionVertex.java:758)
at
org.apache.flink.runtime.scheduler.DefaultExecutionVertexOperations.markFailed(DefaultExecutionVertexOperations.java:41)
at
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskDeploymentFailure(DefaultScheduler.java:522)
at
org.apache.flink.runtime.scheduler.DefaultScheduler.lambda$assignResourceOrHandleError$6(DefaultScheduler.java:507)
at java.base/java.util.concurrent.CompletableFuture.uniHandle(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
Source)
at
org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:223)
at
org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:168)
at
org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:86)
at
org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66)
at
org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:91)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Slot request bulk is not fulfillable! Could not allocate the required slot
within slot request timeout
at
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
at
java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown
Source)
... 31 more
Caused by:
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
Slot request bulk is not fulfillable! Could not allocate the required slot
within slot request timeout
at
org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:86)
... 24 more
Caused by: java.util.concurrent.TimeoutException: Timeout has occurred: 300000
ms
... 25 more
--
This message was sent by Atlassian Jira
(v8.3.4#803005)