[ 
https://issues.apache.org/jira/browse/FLINK-24114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408756#comment-17408756
 ] 

Chesnay Schepler edited comment on FLINK-24114 at 9/2/21, 11:41 AM:
--------------------------------------------------------------------

The /checkpoints endpoint has a latestSavepoint field somewhere that you could 
use. Unfortunately the /savepoint response does not contain the checkpoint id, 
so you can't query it directly. (Which maybe wouldn't be difficult to change).

I wouldn't say that the endpoint is _broken_. Technically, /savepoints creates 
an _operation_, which as a side-effect creates a savepoint. But yes its just a 
technicality.
Checkpoints&savepoints in general are a bit weird. They are very similar (hence 
why they are returned by /checkpoints), but users interact with them in 
fundamentally different ways. So we end up with this constant struggle of 
wanting to treat them as the same/different thing at the same time.

I suppose what one would want is that users do a POST to /checkpoints, which 
triggers a savepoint and returns the link /checkpoints/<id>. /checkpoints/<id> 
returns status=in_progress while it is in progress, and is later replaced with 
the actual checkpoint. At least to me that is the most intuitive thing.

Or we finally introduce a term that combines checkpoints & savepoints that 
_isn't_ "checkpoints".


was (Author: zentol):
The /checkpoints endpoint has a latestSavepoint field somewhere that you could 
use. Unfortunately the /savepoint response does not contain the checkpoint id, 
so you can't query it directly. (Which maybe wouldn't be difficult to change).

I wouldn't say that the endpoint is _broken_. Technically, /savepoints creates 
an _operation_, which as a side-effect creates a savepoint. But yes its just a 
technicality.
Checkpoints&savepoints in general are a bit weird. They are very similar (hence 
why they are returned by /checkpoints), but users interact with them in 
fundamentally different ways. So we end up with this constant struggle of 
wanting to treat them as the same/different thing at the same time.

I suppose what one would want is that users do a POST to /checkpoints, which 
triggers a savepoint and returns the link /checkpoints/<id>. /checkpoints/<id> 
returns status=in_progress while it is in progress, and is later replaced with 
the actual checkpoint. At least to me that is the most intuitive thing.

> Make 
> CompletedOperationCache.COMPLETED_OPERATION_RESULT_CACHE_DURATION_SECONDS 
> configurable (at least for savepoint trigger operations)
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-24114
>                 URL: https://issues.apache.org/jira/browse/FLINK-24114
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Robert Metzger
>            Priority: Major
>
> Currently, it can happen that external services triggering savepoints can not 
> persist the savepoint location from the savepoint handler, because the 
> operation cache has a hardcoded value of 5 minutes, until entries (which have 
> been accessed at least once) are evicted.
> To avoid scenarios where the savepoint location has been accessed, but the 
> external system failed to persist the location, I propose to make this 
> eviction timeout configurable (so that I as a user can configure a value of 
> 24 hours for the cache eviction).
> (This is related to FLINK-24113)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to