[jira] [Commented] (FLINK-25586) ExecutionGraphInfoStore in session cluster should split failed and successful jobs

Zhanghao Chen (Jira) Sat, 15 Jan 2022 23:09:04 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-25586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476743#comment-17476743
 ]


Zhanghao Chen commented on FLINK-25586:
---------------------------------------

[~dmvk] I think the CANCELLED state be treated as a successful execution, 
"successful" in a sense that the terminating state is in accordance with what 
the user expects for their action (cancelling the job). This is not the case 
for FAILED state, as no body will expect their job to end up with a FAILED 
state when they submit them.

> ExecutionGraphInfoStore in session cluster should split failed and successful 
> jobs
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-25586
>                 URL: https://issues.apache.org/jira/browse/FLINK-25586
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.12.7, 1.13.5, 1.14.2
>            Reporter: Shammon
>            Priority: Major
>
> In flink session cluster, jobs are stored in `FileExecutionGraphInfoStore`. 
> When the count of jobs in it reaches `jobstore.cache-size` or the live time 
> of jobs reaches `jobstore.expiration-time`, the specify jobs will be removed. 
> We can't holds too many jobs for performance reason, but we should hold 
> failed jobs for longer time to trace the cause of failure. So it's better to 
> split failed and successful jobs in `FileExecutionGraphInfoStore` and support 
> independent max-capacity for them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-25586) ExecutionGraphInfoStore in session cluster should split failed and successful jobs

Reply via email to