[jira] [Commented] (FLINK-33483) Why is “UNDEFINED” defined in the Flink task status?

Xin Chen (Jira) Mon, 13 Nov 2023 17:59:41 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-33483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785718#comment-17785718
 ]


Xin Chen commented on FLINK-33483:
----------------------------------

Hi [~mapohl] yes, this is not a question in recent version. But I also see some 
code about "ApplicationStatus.UNKNOWN" and "UNDEFINED", so it is confused 
whether there still be "UNDEFINED" scenarios in Flink?  Some other code like:

{code:java}
// ApplicationDispatcherBootstrap.java#getJobResult()
private CompletableFuture<JobResult> getJobResult(
            final DispatcherGateway dispatcherGateway,
            final JobID jobId,
            final ScheduledExecutor scheduledExecutor,
            final boolean tolerateMissingResult) {
        final Time timeout =
                
Time.milliseconds(configuration.get(ClientOptions.CLIENT_TIMEOUT).toMillis());
        final Time retryPeriod =
                
Time.milliseconds(configuration.get(ClientOptions.CLIENT_RETRY_PERIOD).toMillis());
        final CompletableFuture<JobResult> jobResultFuture =
                JobStatusPollingUtils.getJobResult(
                        dispatcherGateway, jobId, scheduledExecutor, timeout, 
retryPeriod);
        if (tolerateMissingResult) {
            // Return "unknown" job result if dispatcher no longer knows the 
actual result.
            return FutureUtils.handleException(
                    jobResultFuture,
                    FlinkJobNotFoundException.class,
                    exception ->
                            new JobResult.Builder()
                                    .jobId(jobId)
                                    
.applicationStatus(ApplicationStatus.UNKNOWN)
                                    .netRuntime(Long.MAX_VALUE)
                                    .build());
        }
        return jobResultFuture;
    }



// RestClusterClient.java#requestJobResultInternal
private CompletableFuture<JobResult> requestJobResultInternal(@Nonnull JobID 
jobId) {
        return pollResourceAsync(
                        () -> {
                            final JobMessageParameters messageParameters =
                                    new JobMessageParameters();
                            messageParameters.jobPathParameter.resolve(jobId);
                            return sendRequest(
                                    JobExecutionResultHeaders.getInstance(), 
messageParameters);
                        })
                .thenApply(
                        jobResult -> {
                            if (jobResult.getApplicationStatus() == 
ApplicationStatus.UNKNOWN) {
                                throw new JobStateUnknownException(
                                        String.format("Result for Job %s is 
UNKNOWN", jobId));
                            }
                            return jobResult;
                        });
    }
{code}

I want to know which other scenarios the final state of the Flink task may be 
UNDEFINED.

> Why is “UNDEFINED” defined in the Flink task status?
> ----------------------------------------------------
>
>                 Key: FLINK-33483
>                 URL: https://issues.apache.org/jira/browse/FLINK-33483
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.12.2
>            Reporter: Xin Chen
>            Priority: Major
>         Attachments: container_e15_1693914709123_8498_01_000001_8042, 
> reproduce.log
>
>
> In the Flink on Yarn mode, if an unknown status appears in the Flink log, 
> jm(jobmanager) will report the task status as undefined. The Yarn page will 
> display the state as FINISHED, but the final status is *UNDEFINED*. In terms 
> of business, it is unknown whether the task has failed or succeeded, and 
> whether to retry. It has a certain impact. Why should we design UNDEFINED? 
> Usually, this situation occurs due to zk(zookeeper) disconnection or jm 
> abnormality, etc. Since the abnormality is present, why not use FAILED?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-33483) Why is “UNDEFINED” defined in the Flink task status?

Reply via email to