[
https://issues.apache.org/jira/browse/FLINK-32943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhu closed FLINK-32943.
-----------------------
Resolution: Not A Bug
> run batch tasks concurrently, the tasks still in the initialization status
> --------------------------------------------------------------------------
>
> Key: FLINK-32943
> URL: https://issues.apache.org/jira/browse/FLINK-32943
> Project: Flink
> Issue Type: Bug
> Affects Versions: 1.15.2
> Environment: flink 1.15.2
>
> |*lob.server.port*|6124|
> |*classloader.resolve-order*|parent-first|
> |*jobmanager.execution.failover-strategy*|region|
> |*jobmanager.memory.heap.size*|2228014280b|
> |*jobmanager.memory.jvm-metaspace.size*|536870912b|
> |*jobmanager.memory.jvm-overhead.max*|322122552b|
> |*jobmanager.memory.jvm-overhead.min*|322122552b|
> |*jobmanager.memory.off-heap.size*|134217728b|
> |*jobmanager.memory.process.size*|3gb|
> |*jobmanager.rpc.address*|naf-flink-ms-flink-manager-1-4gcwz|
> |*jobmanager.rpc.port*|6123|
> |*parallelism.default*|1|
> |*query.server.port*|6125|
> |*rest.address*|0.0.0.0|
> |*rest.bind-address*|0.0.0.0|
> |*rest.connection-timeout*|60000|
> |*rest.server.numThreads*|8|
> |*slot.request.timeout*|3000000|
> |*state.backend.rocksdb.localdir*|/home/nafplat/data/flinkStateStore|
> |*state.backend.type*|rocksdb|
> |*taskmanager.bind-host*|0.0.0.0|
> |*taskmanager.host*|0.0.0.0|
> |*taskmanager.memory.framework.off-heap.batch-shuffle.size*|256mb|
> |*taskmanager.memory.framework.off-heap.size*|512mb|
> |*taskmanager.memory.managed.fraction*|0.4|
> |*taskmanager.memory.network.fraction*|0.2|
> |*taskmanager.memory.process.size*|16gb|
> |*taskmanager.memory.task.off-heap.size*|268435456bytes|
> |*taskmanager.numberOfTaskSlots*|6|
> |*taskmanager.runtime.large-record-handler*|true|
> |*web.submit.enable*|true|
> |*web.tmpdir*|/tmp/flink-web-4be192ba-870a-4f88-8185-d07fa6303cca|
> |*web.upload.dir*|/opt/flink/nafJar|
> Reporter: zhu
> Priority: Major
>
> run 1.15.2 flink session on k8s,In most cases, there is no problem.
> Sometimes, tasks are initialized continuously, and subsequent tasks are also
> initialized continuously,and
> i find jobmanager thread dump jobmanager-io thread all blocked,
> I run batch job with 6 concurrent,jobmanage with 2cpu and 3g Memory
> When this situation occurs, i find this source code will still loop
> public static void waitUntilJobInitializationFinished(
> SupplierWithException<JobStatus, Exception> jobStatusSupplier,
> SupplierWithException<JobResult, Exception> jobResultSupplier,
> ClassLoader userCodeClassloader)
> throws JobInitializationException {
> LOG.debug("Wait until job initialization is finished");
> WaitStrategy waitStrategy = new ExponentialWaitStrategy(50, 2000);
> try {
> JobStatus status = jobStatusSupplier.get();
> long attempt = 0;
> while (status == JobStatus.INITIALIZING) {
> Thread.sleep(waitStrategy.sleepTime(attempt++));
> status = jobStatusSupplier.get();
> }
> if (status == JobStatus.FAILED) {
> JobResult result = jobResultSupplier.get();
> Optional<SerializedThrowable> throwable =
> result.getSerializedThrowable();
> if (throwable.isPresent()) {
> Throwable t =
> throwable.get().deserializeError(userCodeClassloader);
> if (t instanceof JobInitializationException) {
> throw t;
> }
> }
> }
> } catch (JobInitializationException initializationException) {
> throw initializationException;
> } catch (Throwable throwable) {
> ExceptionUtils.checkInterrupted(throwable);
> throw new RuntimeException("Error while waiting for job to be
> initialized", throwable);
> }
> }
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)