[jira] [Comment Edited] (FLINK-24114) Make CompletedOperationCache.COMPLETED_OPERATION_RESULT_CACHE_DURATION_SECONDS configurable (at least for savepoint trigger operations)

Chesnay Schepler (Jira) Thu, 02 Sep 2021 04:43:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-24114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408756#comment-17408756
 ]


Chesnay Schepler edited comment on FLINK-24114 at 9/2/21, 11:42 AM:
--------------------------------------------------------------------

The /checkpoints endpoint has a latestSavepoint field somewhere that you could 
use. Unfortunately the /savepoint response does not contain the checkpoint id, 
so you can't query it directly. (Which maybe wouldn't be difficult to change).

I wouldn't say that the endpoint is _broken_. Technically, /savepoints creates 
an _operation_, which as a side-effect creates a savepoint. But yes its just a 
technicality.
Checkpoints&savepoints in general are a bit weird. They are very similar (hence 
why they are returned by /checkpoints), but users interact with them in 
fundamentally different ways. So we end up with this constant struggle of 
wanting to treat them as the same/different thing at the same time.

I suppose what one would want is that users do a POST to /checkpoints, which 
triggers a savepoint and returns the link /checkpoints/<id>. /checkpoints/<id> 
returns status=in_progress while it is in progress, and is later replaced with 
the actual checkpoint. At least to me that is the most intuitive thing.

Which isn't quite ideal because checkpoints are technically a different thing, 
so we may finally introduce a term that combines checkpoints & savepoints that 
_isn't_ "checkpoints".


was (Author: zentol):
The /checkpoints endpoint has a latestSavepoint field somewhere that you could 
use. Unfortunately the /savepoint response does not contain the checkpoint id, 
so you can't query it directly. (Which maybe wouldn't be difficult to change).

I wouldn't say that the endpoint is _broken_. Technically, /savepoints creates 
an _operation_, which as a side-effect creates a savepoint. But yes its just a 
technicality.
Checkpoints&savepoints in general are a bit weird. They are very similar (hence 
why they are returned by /checkpoints), but users interact with them in 
fundamentally different ways. So we end up with this constant struggle of 
wanting to treat them as the same/different thing at the same time.

I suppose what one would want is that users do a POST to /checkpoints, which 
triggers a savepoint and returns the link /checkpoints/<id>. /checkpoints/<id> 
returns status=in_progress while it is in progress, and is later replaced with 
the actual checkpoint. At least to me that is the most intuitive thing.

Or we finally introduce a term that combines checkpoints & savepoints that 
_isn't_ "checkpoints".

> Make 
> CompletedOperationCache.COMPLETED_OPERATION_RESULT_CACHE_DURATION_SECONDS 
> configurable (at least for savepoint trigger operations)
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-24114
>                 URL: https://issues.apache.org/jira/browse/FLINK-24114
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Robert Metzger
>            Priority: Major
>
> Currently, it can happen that external services triggering savepoints can not 
> persist the savepoint location from the savepoint handler, because the 
> operation cache has a hardcoded value of 5 minutes, until entries (which have 
> been accessed at least once) are evicted.
> To avoid scenarios where the savepoint location has been accessed, but the 
> external system failed to persist the location, I propose to make this 
> eviction timeout configurable (so that I as a user can configure a value of 
> 24 hours for the cache eviction).
> (This is related to FLINK-24113)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-24114) Make CompletedOperationCache.COMPLETED_OPERATION_RESULT_CACHE_DURATION_SECONDS configurable (at least for savepoint trigger operations)

Reply via email to