[jira] [Updated] (FLINK-33121) Failed precondition in JobExceptionsHandler due to concurrent global failures

Chesnay Schepler (Jira) Thu, 28 Mar 2024 10:09:57 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-33121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chesnay Schepler updated FLINK-33121:
-------------------------------------
    Affects Version/s: 1.18.0

> Failed precondition in JobExceptionsHandler due to concurrent global failures
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-33121
>                 URL: https://issues.apache.org/jira/browse/FLINK-33121
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.18.0
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>              Labels: pull-request-available
>
> We make the assumption that Global Failures (with null Task name) may only be 
> RootExceptions and and Local/Task exception may be part of concurrent 
> exceptions List (see {{{}JobExceptionsHandler#createRootExceptionInfo{}}}).
> However, when the Adaptive scheduler is in a Restarting phase due to an 
> existing failure (that is now the new Root) we can still, in rare occasions, 
> capture new Global failures, violating this condition (with an assertion is 
> thrown as part of {{{}assertLocalExceptionInfo{}}}) seeing something like:
> {code:java}
> The taskName must not be null for a non-global failure.  {code}
> We want to ignore Global failures while being in a Restarting phase on the 
> Adaptive scheduler until we properly support multiple Global failures in the 
> Exception History as part of https://issues.apache.org/jira/browse/FLINK-34922
> Note: DefaultScheduler does not suffer from this issue as it treats failures 
> directly as HistoryEntries (no conversion step)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-33121) Failed precondition in JobExceptionsHandler due to concurrent global failures

Reply via email to