Hi Gyula,

Thank you so much for your thoughtful and insightful feedback!


1.  I fully agree that using the job name for job matching is more 
user-friendly and
cleaner than relying on a jobIndex parameter. I’ll update the FLIP accordingly
to reflect this design change.


2. I’d like to dig a bit deeper to make sure I fully understand the requirement.
You have mentioned the need for a generic information endpoint that remains 
accessible even after failure, and that it should include additional info such 
as 
the checkpoint restore path and configuration.

From my current understanding, Flink’s existing archive mechanism—combined 
with the HistoryServer—already provides persistent access to job-related 
information after failure. Specifically, the existing HistoryServer endpoint
`/jobs/:jobid/jobmanager/config` seems capable of exposing the configuration 
including checkpoint restore paths, and remains accessible after failure.
On the other hand, the proposed /applications/:appid/exceptions endpoint is 
intended specifically to surface application-level exceptions that occur 
outside 
the job lifecycle, which will also be available through the HistoryServer after 
failure.

So could you help clarify whether there is a specific failure scenario or use 
case 
where the current archiving/HistoryServer mechanism falls short or where 
critical
debugging information—like the restore path or configuration—is not retrievable
after a failure?


Thanks again for your excellent suggestions!

Best,
Yi

At 2025-12-25 21:08:49, "Gyula Fóra" <[email protected]> wrote:
>Hi!
>
>Overall I think the design/improvements look great. Some minor comments,
>improvement possibilities:
>
>1. Could we simply use the job name for job matching? I think it's fair to
>require unique job names (or if they are not unique attach a sequence
>number to the name) instead of the jobIndex parameter. JobIndex sounds a
>bit weird and low level.
>
>2.A big problem/limitation of the existing submission logic is that the
>submit-on-error logic is very limited (only handling certain types of
>errors and only showing exception info). We should capture different errors
>and metadata for failed applications including checkpoint settings (for
>instance what checkpoint path was used during restore, which is a common
>cause of the errors). So instead of introducing a
>/applications/appid/exceptions endpoint, can we instead introduce a more
>generic information endpoint that would contain other information? This
>endpoint should be accessible even in cause of failures and populated from
>the app result store and should also contain some other info such as
>checkpoint restore path, configuration etc.
>
>Capturing more information on failed submissions would help resolve a lot
>of long outstanding issues in the Flink Kubernetes Operator as well.
>
>Cheers
>Gyula
>
>
>On Thu, Dec 25, 2025 at 1:54 PM Lei Yang <[email protected]> wrote:
>
>> Thank you Yi for your reply, looks good to me!
>> +1 for this proposal
>> Best,
>> Lei
>>
>> Yi Zhang <[email protected]> 于2025年12月25日周四 10:02写道:
>>
>> > Hi Lei,
>> >
>> >
>> > Thank you for the feedback!
>> > The "Archiving Directory Structure" section describes a change in how
>> > archived
>> > files are organized under jobmanager.archive.fs.dir. While this change
>> was
>> > originally proposed in FLIP-549, it's indeed a significant
>> > application-level update,
>> > so I'm glad to have the chance to clarify it here.
>> >
>> >
>> > To answer your question directly: backward compatibility is fully
>> > preserved.
>> >
>> >
>> > In earlier Flink versions, job archives were written directly under the
>> > configured
>> > jobmanager.archive.fs.dir. With this update, Flink will instead use a
>> > hierarchical
>> > cluster-application-job structure.
>> > We understand that many users already have archives stored in the legacy
>> > flat
>> > layout. To ensure a smooth transition, the History Server will be updated
>> > to read
>> > archives from both the old and new directory structures. As a result, all
>> > previously archived jobs will remain accessible and visible.
>> >
>> >
>> > If you have additional questions or specific edge cases in mind, I’d be
>> > happy to
>> > discuss them further!
>> >
>> >
>> > Best,
>> > Yi
>> >
>> >
>> >
>> > At 2025-12-24 11:35:00, "Lei Yang" <[email protected]> wrote:
>> > >Hi Yi,
>> > >
>> > >Thank you for creating this FLIP! The introduction of the Application
>> > >entity significantly enhances the observability and manageability of
>> > >user logic, especially benefiting batch workloads. This is truly
>> > >excellent work!
>> > >
>> > >However, I have a compatibility concern and would appreciate your
>> > >clarification. In the “Archiving Directory Structure” section, I noticed
>> > >that the directory structure has been changed. If users have configured
>> > >a persistent external path for jobmanager.archive.fs.dir, will their
>> > >existing archives become unreadable after this change? Will the
>> > >implementation of this FLIP maintain backward compatibility with
>> > >previously archived job data?
>> > >
>> > >Best regards,
>> > >Lei
>> > >
>> > >Yi Zhang <[email protected]> 于2025年12月17日周三 14:18写道:
>> > >
>> > >> Hi everyone,
>> > >>
>> > >> I would like to start a discussion about FLIP-560: Application
>> > Capability
>> > >> Enhancement [1].
>> > >>
>> > >> The primary goal of this FLIP is to improve the usability and
>> > availability
>> > >> of Flink applications
>> > >>
>> > >>  by introducing the following enhancements:
>> > >>
>> > >>
>> > >>
>> > >> 1. Support multi-job execution in Application Mode, which is an
>> > important
>> > >> batch-processing    use case.
>> > >> 2. Support re-running the user's main method after JobManager restarts
>> > due
>> > >> to failures in    Session Mode.
>> > >> 3. Expose exceptions thrown in the user's main method via REST/UI.
>> > >>
>> > >>
>> > >>
>> > >> Looking forward to your feedback and suggestions!
>> > >>
>> > >>
>> > >>
>> > >> [1]
>> > >>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-560%3A+Application+Capability+Enhancement
>> > >>
>> > >>
>> > >>
>> > >> Best Regards,
>> > >>
>> > >> Yi Zhang
>> >
>>

Reply via email to