[ 
https://issues.apache.org/jira/browse/FLINK-25586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17576491#comment-17576491
 ] 

Xintong Song commented on FLINK-25586:
--------------------------------------

I think this would be a good feature to have. +1 for it.

Also +1 for treating CANCELLED jobs the same as FINISHED jobs. I think the 
purpose for preserving FAILED jobs for longer time is to allow users to find 
out what happened when they discovered the failure later. For CANCELLED jobs, 
in most cases users should be able to notice any abnormality and collect logs 
or other information before canceling the jobs.

Thanks for volunteering on this, [~Zhanghao Chen]. You are assigned.

> ExecutionGraphInfoStore in session cluster should split failed and successful 
> jobs
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-25586
>                 URL: https://issues.apache.org/jira/browse/FLINK-25586
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.12.7, 1.13.5, 1.14.2
>            Reporter: Shammon
>            Assignee: Zhanghao Chen
>            Priority: Major
>
> In flink session cluster, jobs are stored in `FileExecutionGraphInfoStore`. 
> When the count of jobs in it reaches `jobstore.cache-size` or the live time 
> of jobs reaches `jobstore.expiration-time`, the specify jobs will be removed. 
> We can't holds too many jobs for performance reason, but we should hold 
> failed jobs for longer time to trace the cause of failure. So it's better to 
> split failed and successful jobs in `FileExecutionGraphInfoStore` and support 
> independent max-capacity for them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to