[jira] [Commented] (FLINK-25586) ExecutionGraphInfoStore in session cluster should split failed and successful jobs

Jira Wed, 12 Jan 2022 00:55:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-25586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474374#comment-17474374
 ]


David Morávek commented on FLINK-25586:
---------------------------------------

This makes sense +1

Just one question, how should this behave for streaming workloads, where the 
job usually ends up in either CANCELLED or FAILED state? Should the CANCELLED 
state be treated as a successful execution?

> ExecutionGraphInfoStore in session cluster should split failed and successful 
> jobs
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-25586
>                 URL: https://issues.apache.org/jira/browse/FLINK-25586
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.12.7, 1.13.5, 1.14.2
>            Reporter: Shammon
>            Priority: Major
>
> In flink session cluster, jobs are stored in `FileExecutionGraphInfoStore`. 
> When the count of jobs in it reaches `jobstore.cache-size` or the live time 
> of jobs reaches `jobstore.expiration-time`, the specify jobs will be removed. 
> We can't holds too many jobs for performance reason, but we should hold 
> failed jobs for longer time to trace the cause of failure. So it's better to 
> split failed and successful jobs in `FileExecutionGraphInfoStore` and support 
> independent max-capacity for them.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-25586) ExecutionGraphInfoStore in session cluster should split failed and successful jobs

Reply via email to