[
https://issues.apache.org/jira/browse/FLINK-25586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474374#comment-17474374
]
David Morávek commented on FLINK-25586:
---------------------------------------
This makes sense +1
Just one question, how should this behave for streaming workloads, where the
job usually ends up in either CANCELLED or FAILED state? Should the CANCELLED
state be treated as a successful execution?
> ExecutionGraphInfoStore in session cluster should split failed and successful
> jobs
> ----------------------------------------------------------------------------------
>
> Key: FLINK-25586
> URL: https://issues.apache.org/jira/browse/FLINK-25586
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.12.7, 1.13.5, 1.14.2
> Reporter: Shammon
> Priority: Major
>
> In flink session cluster, jobs are stored in `FileExecutionGraphInfoStore`.
> When the count of jobs in it reaches `jobstore.cache-size` or the live time
> of jobs reaches `jobstore.expiration-time`, the specify jobs will be removed.
> We can't holds too many jobs for performance reason, but we should hold
> failed jobs for longer time to trace the cause of failure. So it's better to
> split failed and successful jobs in `FileExecutionGraphInfoStore` and support
> independent max-capacity for them.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)