[jira] [Closed] (FLINK-32943) run batch tasks concurrently, the tasks still in the initialization status

zhu (Jira) Tue, 31 Oct 2023 20:36:04 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-32943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


zhu closed FLINK-32943.
-----------------------
    Resolution: Not A Bug

> run batch tasks concurrently, the tasks still in the initialization status
> --------------------------------------------------------------------------
>
>                 Key: FLINK-32943
>                 URL: https://issues.apache.org/jira/browse/FLINK-32943
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.15.2
>         Environment: flink 1.15.2
>  
> |*lob.server.port*|6124|
> |*classloader.resolve-order*|parent-first|
> |*jobmanager.execution.failover-strategy*|region|
> |*jobmanager.memory.heap.size*|2228014280b|
> |*jobmanager.memory.jvm-metaspace.size*|536870912b|
> |*jobmanager.memory.jvm-overhead.max*|322122552b|
> |*jobmanager.memory.jvm-overhead.min*|322122552b|
> |*jobmanager.memory.off-heap.size*|134217728b|
> |*jobmanager.memory.process.size*|3gb|
> |*jobmanager.rpc.address*|naf-flink-ms-flink-manager-1-4gcwz|
> |*jobmanager.rpc.port*|6123|
> |*parallelism.default*|1|
> |*query.server.port*|6125|
> |*rest.address*|0.0.0.0|
> |*rest.bind-address*|0.0.0.0|
> |*rest.connection-timeout*|60000|
> |*rest.server.numThreads*|8|
> |*slot.request.timeout*|3000000|
> |*state.backend.rocksdb.localdir*|/home/nafplat/data/flinkStateStore|
> |*state.backend.type*|rocksdb|
> |*taskmanager.bind-host*|0.0.0.0|
> |*taskmanager.host*|0.0.0.0|
> |*taskmanager.memory.framework.off-heap.batch-shuffle.size*|256mb|
> |*taskmanager.memory.framework.off-heap.size*|512mb|
> |*taskmanager.memory.managed.fraction*|0.4|
> |*taskmanager.memory.network.fraction*|0.2|
> |*taskmanager.memory.process.size*|16gb|
> |*taskmanager.memory.task.off-heap.size*|268435456bytes|
> |*taskmanager.numberOfTaskSlots*|6|
> |*taskmanager.runtime.large-record-handler*|true|
> |*web.submit.enable*|true|
> |*web.tmpdir*|/tmp/flink-web-4be192ba-870a-4f88-8185-d07fa6303cca|
> |*web.upload.dir*|/opt/flink/nafJar|
>            Reporter: zhu
>            Priority: Major
>
> run 1.15.2 flink session on k8s，In most cases, there is no problem. 
> Sometimes, tasks are initialized continuously, and subsequent tasks are also 
> initialized continuously，and
> i find jobmanager thread dump   jobmanager-io thread all blocked，
> I run batch job with 6 concurrent,jobmanage with 2cpu and 3g Memory
> When this situation occurs， i find  this source code will still loop
> public static void waitUntilJobInitializationFinished(
>             SupplierWithException<JobStatus, Exception> jobStatusSupplier,
>             SupplierWithException<JobResult, Exception> jobResultSupplier,
>             ClassLoader userCodeClassloader)
>             throws JobInitializationException {
>         LOG.debug("Wait until job initialization is finished");
>         WaitStrategy waitStrategy = new ExponentialWaitStrategy(50, 2000);
>         try {
>             JobStatus status = jobStatusSupplier.get();
>             long attempt = 0;
>             while (status == JobStatus.INITIALIZING) {
>                 Thread.sleep(waitStrategy.sleepTime(attempt++));
>                 status = jobStatusSupplier.get();
>             }
>             if (status == JobStatus.FAILED) {
>                 JobResult result = jobResultSupplier.get();
>                 Optional<SerializedThrowable> throwable = 
> result.getSerializedThrowable();
>                 if (throwable.isPresent()) {
>                     Throwable t = 
> throwable.get().deserializeError(userCodeClassloader);
>                     if (t instanceof JobInitializationException) {
>                         throw t;
>                     }
>                 }
>             }
>         } catch (JobInitializationException initializationException) {
>             throw initializationException;
>         } catch (Throwable throwable) {
>             ExceptionUtils.checkInterrupted(throwable);
>             throw new RuntimeException("Error while waiting for job to be 
> initialized", throwable);
>         }
>     }
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Closed] (FLINK-32943) run batch tasks concurrently, the tasks still in the initialization status

Reply via email to