Hi!

Overall I think the design/improvements look great. Some minor comments,
improvement possibilities:

1. Could we simply use the job name for job matching? I think it's fair to
require unique job names (or if they are not unique attach a sequence
number to the name) instead of the jobIndex parameter. JobIndex sounds a
bit weird and low level.

2.A big problem/limitation of the existing submission logic is that the
submit-on-error logic is very limited (only handling certain types of
errors and only showing exception info). We should capture different errors
and metadata for failed applications including checkpoint settings (for
instance what checkpoint path was used during restore, which is a common
cause of the errors). So instead of introducing a
/applications/appid/exceptions endpoint, can we instead introduce a more
generic information endpoint that would contain other information? This
endpoint should be accessible even in cause of failures and populated from
the app result store and should also contain some other info such as
checkpoint restore path, configuration etc.

Capturing more information on failed submissions would help resolve a lot
of long outstanding issues in the Flink Kubernetes Operator as well.

Cheers
Gyula


On Thu, Dec 25, 2025 at 1:54 PM Lei Yang <[email protected]> wrote:

> Thank you Yi for your reply, looks good to me!
> +1 for this proposal
> Best,
> Lei
>
> Yi Zhang <[email protected]> 于2025年12月25日周四 10:02写道:
>
> > Hi Lei,
> >
> >
> > Thank you for the feedback!
> > The "Archiving Directory Structure" section describes a change in how
> > archived
> > files are organized under jobmanager.archive.fs.dir. While this change
> was
> > originally proposed in FLIP-549, it's indeed a significant
> > application-level update,
> > so I'm glad to have the chance to clarify it here.
> >
> >
> > To answer your question directly: backward compatibility is fully
> > preserved.
> >
> >
> > In earlier Flink versions, job archives were written directly under the
> > configured
> > jobmanager.archive.fs.dir. With this update, Flink will instead use a
> > hierarchical
> > cluster-application-job structure.
> > We understand that many users already have archives stored in the legacy
> > flat
> > layout. To ensure a smooth transition, the History Server will be updated
> > to read
> > archives from both the old and new directory structures. As a result, all
> > previously archived jobs will remain accessible and visible.
> >
> >
> > If you have additional questions or specific edge cases in mind, I’d be
> > happy to
> > discuss them further!
> >
> >
> > Best,
> > Yi
> >
> >
> >
> > At 2025-12-24 11:35:00, "Lei Yang" <[email protected]> wrote:
> > >Hi Yi,
> > >
> > >Thank you for creating this FLIP! The introduction of the Application
> > >entity significantly enhances the observability and manageability of
> > >user logic, especially benefiting batch workloads. This is truly
> > >excellent work!
> > >
> > >However, I have a compatibility concern and would appreciate your
> > >clarification. In the “Archiving Directory Structure” section, I noticed
> > >that the directory structure has been changed. If users have configured
> > >a persistent external path for jobmanager.archive.fs.dir, will their
> > >existing archives become unreadable after this change? Will the
> > >implementation of this FLIP maintain backward compatibility with
> > >previously archived job data?
> > >
> > >Best regards,
> > >Lei
> > >
> > >Yi Zhang <[email protected]> 于2025年12月17日周三 14:18写道:
> > >
> > >> Hi everyone,
> > >>
> > >> I would like to start a discussion about FLIP-560: Application
> > Capability
> > >> Enhancement [1].
> > >>
> > >> The primary goal of this FLIP is to improve the usability and
> > availability
> > >> of Flink applications
> > >>
> > >>  by introducing the following enhancements:
> > >>
> > >>
> > >>
> > >> 1. Support multi-job execution in Application Mode, which is an
> > important
> > >> batch-processing    use case.
> > >> 2. Support re-running the user's main method after JobManager restarts
> > due
> > >> to failures in    Session Mode.
> > >> 3. Expose exceptions thrown in the user's main method via REST/UI.
> > >>
> > >>
> > >>
> > >> Looking forward to your feedback and suggestions!
> > >>
> > >>
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-560%3A+Application+Capability+Enhancement
> > >>
> > >>
> > >>
> > >> Best Regards,
> > >>
> > >> Yi Zhang
> >
>

Reply via email to