Re: [DISCUSS] FLIP-549: Support Application Management

Yi Zhang Sat, 18 Oct 2025 01:32:01 -0700

Hi Ryan,


That's a great point. Thanks for bringing it up.
Based on my understanding, the JobManager has already taken
on composite responsibilities that extend beyond job management,
such as handling the REST requests and resource management.
Application management surely increases the complexity.


However, as you pointed out, renaming such a core  component
faces significant hurdles, especially concerning the compatibility of
public APIs. Therefore, for now, it seems most practical to continue
with the original name.


Best,
Yi

At 2025-09-29 22:39:57, "Ryan van Huuksloot" 
<[email protected]> wrote:
>One thing to consider is how "JobManager" would no longer accurately
>reflect the responsibilities of that pod (in K8s).
>
>I think it is very difficult to rename that component but I did want to
>point out how the responsibilities of the JobManager are at a higher level
>than Job with this change.
>
>Ryan van Huuksloot
>Staff Engineer, Infrastructure | Streaming Platform
>[image: Shopify]
><https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
>
>
>On Sun, Sep 28, 2025 at 2:31 AM Lei Yang <[email protected]> wrote:
>
>> Thank you Yi for your replay, looks good to me!
>>
>> +1 for this proposal
>>
>> Best,
>> Lei
>>
>> Yi Zhang <[email protected]> 于2025年9月26日周五 10:43写道：
>>
>> > Hi Lei,
>> >
>> >
>> > Thank you for the feedback! I really appreciate you sharing these great
>> > questions and I would like to clarify my thinking:
>> >
>> >
>> > 1. Handling FINISHED jobs in FAILING state
>> > The FAILING state is designed to close active components, so
>> > already-FINISHED jobs are intentionally left untouched. This keeps the
>> > state transitions clean and simple.
>> >
>> > 2. Application HA and RESTARTING state
>> > This is a very interesting point. Application HA in the follow-up tasks
>> is
>> > primarily centered around recovering from a JobManager failure (e.g., due
>> > to a machine crash). In that scenario, the JobManager itself is
>> > unavailable, making it impossible to update or query the application's
>> > status.
>> >
>> >
>> > However, you've brought up another excellent use case: automatically
>> > restarting an application in response to a failed job (or other errors in
>> > the main execution logic). This would be a powerful mechanism to build
>> > resilience against transient issues like network instability. For this
>> > scenario, you are absolutely right. Introducing a RESTARTING state for
>> > application would be both reasonable and necessary to clearly indicate to
>> > the user that a recovery attempt is in progress.
>> > This capability seems like an important enhancement to application
>> > management and may involve significant work. To keep the scope of the
>> > current FLIP focused, I propose we don't include this functionality for
>> > now.
>> > If you are interested, I would be very happy to discuss this feature
>> > further in a separate thread. I think it's a great direction for future
>> > work.
>> >
>> >
>> >
>> >
>> > Best Regards,
>> >
>> > Yi
>> >
>> >
>> > At 2025-09-25 17:32:10, "Lei Yang" <[email protected]> wrote:
>> > >Hi Yi, thanks for creating this FLIP!
>> > >
>> > >I'm trying to understand your FLIP. By introducing the Application
>> entity,
>> > >you're able to organically organize jobs, making them easier to observe
>> > >and manage. This is great work!
>> > >
>> > >I'd like to share some questions with you, and hope you could help me
>> > >clarify them:
>> > >
>> > >1. When an application is in the FAILING state, how are the jobs that
>> have
>> > >already reached the FINISHED state handled? Will they simply be ignored,
>> > >or will there be other actions taken?
>> > >
>> > >2. In the "Follow-up Tasks", you mentioned high availability for the
>> > >application,
>> > >which will restart failed jobs to restore the application. However, I
>> > >didn't see the
>> > >description of the application's status during such restarts in the
>> FLIP.
>> > I
>> > >think
>> > >we might need to introduce a RESTARTING status to explicitly indicate
>> the
>> > >application is in the process of restarting?
>> > >
>> > >Best,
>> > >Lei
>> > >
>> > >Yi Zhang <[email protected]> 于2025年9月23日周二 11:24写道：
>> > >
>> > >> Hi everyone,
>> > >>
>> > >>
>> > >> I would like to start a discussion about FLIP-549: Support Application
>> > >> Management [1].
>> > >>
>> > >>
>> > >> Despite Flink’s widespread adoption, the existing model for running
>> user
>> > >> logic limits observability and execution flexibility, which affects
>> user
>> > >> experience. This FLIP introduces a new application management
>> framework
>> > >> designed to close these gaps and provide a foundation for future
>> > >> improvements.
>> > >>
>> > >>
>> > >> Looking forward to your feedback and suggestions.
>> > >>
>> > >>
>> > >>
>> > >> [1]
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-549%3A+Support+Application+Management
>> > >>
>> > >>
>> > >> Best regards,
>> > >>
>> > >> Yi Zhang
>> >
>>

Re: [DISCUSS] FLIP-549: Support Application Management

Reply via email to