Hi Chesnay, thanks for your input,
> That seems like something we'd maybe want to introduce consistently for
all checkpoint-related endpoints.

Do you believe we should scope it out of this FLIP and follow up with
another FLIP, also I only see it might impact the
/jobs/:jobid/checkpoints endpoint
since the other endpoints (details and details with vertices) work on a
specific checkpoint.

> I'm also not sure about returning a 404 if no checkpoints exists
(especially with the filtering) but the job is there.

yeah I see your point, can do the "latest:{} or latest:{...checkpoint
info...}" suggestion. will only need to wrap the details.

> The FLIP should also cover the error cases when it is called for jobs
that don't have checkpointing enabled (e.g., batch).

will add to the FLIP but I guess we should just go with a 400 response

Best Regards
Ahmed Hamdy


On Thu, 24 Jul 2025 at 15:03, Chesnay Schepler <ches...@apache.org> wrote:

> I think the idea of filtering is interesting but I do wonder if we
> should introduce it as part of this FLIP.
> That seems like something we'd maybe want to introduce consistently for
> all checkpoint-related endpoints.
>
> I'm also not sure about returning a 404 if no checkpoints exists
> (especially with the filtering) but the job is there.
> It's a bit annoying to handle on the client-side, especially since there
> are other 404 causes, and it can spuriously happen despite no issue on
> the client side (e.g., when the job is still initializing, or just
> started, or the JM has restarted and lost the checkpoint history (I'm
> not sure if the checkpoint we restore from is included in there).
> As an alternative it could be either latest:{} or latest:{...checkpoint
> info...}
>
> The FLIP should also cover the error cases when it is called for jobs
> that don't have checkpointing enabled (e.g., batch).
>
> On 22/07/2025 06:35, Ahmed Hamdy wrote:
> > Hi Poorvank
> > yes the idea is to do the latest checkpoint Id lookup from the history
> and
> > use it to return the checkpoint details.
> >
> >> Possible to consider adding type (savepoint/checkpoint)
> >     filtering: Since cache returns AbstractCheckpointStats
> >
> > yeah that's a good idea, I believe it might be useful in some cases.
> > Best Regards
> > Ahmed Hamdy
> >
> >
> > On Fri, 18 Jul 2025 at 20:39, Poorvank Bhatia <puravbhat...@gmail.com>
> > wrote:
> >
> >> Hi Ahmed,  Thank you for the FLIP.
> >> +1 (non-binding) for this feature.
> >>
> >> I have two implementation questions:
> >>
> >>     1. Approach for finding latest checkpoints:  Since the FLIP
> >> mentions "utilizing
> >>     existing CheckpointStatsCache,
> >>     <
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=373886441#FLIP536:AddlatestcheckpointdetailsendpointtoRestAPI-ImplementationDetails
> >>> "
> >>     but that cache only supports lookup by checkpoint ID (tryGet(long
> >>     checkpointId))
> >>     <
> >>
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/checkpoints/CheckpointStatsCache.java#L71
> >>> ,
> >>     do you intend to use getLatestCompletedCheckpoint()
> >>     <
> >>
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointStatsHistory.java#L147
> >> to
> >>     find the latest checkpoint, then cache it using
> >>     checkpointStatsCache.tryAdd(). Is this the intended approach, if
> not can
> >>     you clarify more.
> >>     2. Possible to consider adding type (savepoint/checkpoint)
> >>     filtering: Since cache returns AbstractCheckpointStats
> >>     <
> >>
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/AbstractCheckpointStats.java
> >> which
> >>     has CheckpointProperties
> >>     <
> >>
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointProperties.java
> >> that
> >>     can distinguish between regular checkpoints and savepoints,  would
> it be
> >>     valuable to extend the endpoint to support type filtering? i.e
> >>
> >>           GET
> >> /jobs/:jobid/checkpoints/details/latest?status=COMPLETED&type=SAVEPOINT
> >>
> >> On Fri, Jul 18, 2025 at 9:48 PM Ahmed Hamdy <hamdy10...@gmail.com>
> wrote:
> >>
> >>> Hi David,
> >>> Thanks for the feedback, I guess an alternative approach would be
> adding
> >>> paging and sorting to the checkpointing stats query, however this will
> >>> still require 2 REST api calls to get the latest checkpoint details as
> >> the
> >>> stats endpoint only gives a summary not the details, I am open to
> adding
> >>> another query parameter to the endpoint in the FLIP to get latest X
> >>> checkpoint details in one go but I honestly didn't see much of a use
> case
> >>> to have more than one and might complicate how we wanna handle having y
> >>> available checkpoints where 0 < y < X.
> >>> Let me know your thoughts as well as the rest of the community.
> >>>
> >>>
> >>> Best Regards
> >>> Ahmed Hamdy
> >>>
> >>>
> >>> On Fri, 18 Jul 2025 at 16:47, David Radley <david_rad...@uk.ibm.com>
> >>> wrote:
> >>>
> >>>> Hi Ahmed,
> >>>> Thanks for submitting this Flip.
> >>>> What do you think of having /jobs/:jobid/checkpoints with query params
> >> to
> >>>> specify sorted criteria and direction and the number of returned
> >> elements
> >>>> (page size). This would appear to be more of a standard (and flexible)
> >>> way
> >>>> of doing a search. To get the latest you would specify a page size of
> 1
> >>>> with a time sort criteria and descending direction.
> >>>>   WDYT?
> >>>>        Warm regards, David.
> >>>>
> >>>>
> >>>> From: Ahmed Hamdy <hamdy10...@gmail.com>
> >>>> Date: Friday, 18 July 2025 at 15:48
> >>>> To: dev@flink.apache.org <dev@flink.apache.org>
> >>>> Subject: [EXTERNAL] [DISCUSS][FLIP-536] Add latest checkpoint details
> >>>> endpoint to Rest API
> >>>> Hi Devs,
> >>>> I would like to start a discussion on FLIP-536[1] for adding a
> "latest"
> >>>> checkpoint details endpoint to Flink's REST Api. This is a common case
> >> I
> >>>> have personally encountered when integrating components with Flink
> >> using
> >>>> the Rest API.
> >>>> Let me know your thoughts.
> >>>>
> >>>>
> >>>> 1-
> >>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-536%3A+Add+latest+checkpoint+details+endpoint+to+Rest+API
> >>>> Best Regards
> >>>> Ahmed Hamdy
> >>>>
> >>>> Unless otherwise stated above:
> >>>>
> >>>> IBM United Kingdom Limited
> >>>> Registered in England and Wales with number 741598
> >>>> Registered office: Building C, IBM Hursley Office, Hursley Park Road,
> >>>> Winchester, Hampshire SO21 2JN
> >>>>
>
>

Reply via email to