SameerMesiah97 commented on PR #59413:
URL: https://github.com/apache/airflow/pull/59413#issuecomment-3662068635

   @wolvery
   
   Thanks for opening a PR to address this issue. Had I known you were already 
working on a solution, I would have deferred to you, as you naturally 
understand your use case better than I do. That said, I am concerned that 
changing the behaviour of an existing cleanup mode could introduce a risk of 
breaking current deployments.
   
   My understanding of the intended semantics of the cleanup modes is:
   
   1. `delete_pod`: The most aggressive option, and the default. This is best 
suited for “happy path” scenarios where there is no expectation of needing pods 
for post-mortem debugging.
   2. `delete_succeeded_pod`: Cleans up pods that have completed successfully 
and whose logs are no longer required. These pods have run to completion 
without issues and no longer need to remain in the namespace.
   3. `keep_pod`: Retains all pod objects. While I would not personally 
recommend this in most cases, it’s provide users with maximum flexibility.
   4. `delete_active_pod`: A new mode I introduced based on my interpretation 
of your issue. This removes pods that are consuming (or may consume) 
resources—i.e. pods in RUNNING or PENDING states—while retaining completed pods 
for debugging. This avoids scenarios where resource-heavy pods continue running 
after the task has terminated, without losing succeeded/failed pods that may be 
useful for investigation. If my understanding is correct, this should cover 
your use case: retaining completed pods for debugging while cleaning up 
resource-hogging ones.
   
   We could alternatively change the behaviour of `delete_succeeded_pod`, but 
that would require redefining what the flag represents. Doing so risks breaking 
existing deployments and effectively deprecates the current semantics. Keeping 
the flag name while changing its behaviour would also be confusing: a mode 
called `delete_succeeded_pod `would then delete running and pending pods as 
well. While this may seem pedantic, many users will not read docstrings closely 
and could inadvertently choose a setting that deletes pods they intended to 
keep.
   
   If you are insistent on requiring a mode that deletes all pods save for the 
ones that have not failed, I believe it would be better to add a new flag for 
this with an appropriate name such as `delete_non_failed_pod`. But would this 
new flag provide additional useful functionality that `delete_active_pod` does 
not? I understand you might want succeeded pods to be gone with the  
running/pending pods but I am not sure of what purpose that would serve? In 
post-mortem debugging, I believe it would be more useful to have logs for both 
successful and failing states rather than just the latter.
   
   I think it would be best to have someone who owns this provider look at this 
and offer their comments. Even though this is a seemingly trivial change, it 
has subtle risks that require consideration. That being said I am happy that 
you raised the issue. The absence of a specific flag to delete running/pending 
pods on task termination was definitely a gap that needed to be addressed.
   
   @hussein-awala @jedcunningham 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to