potiuk commented on issue #21943:
URL: https://github.com/apache/airflow/issues/21943#issuecomment-1423966074

   I think https://github.com/apache/airflow/pull/29439 should handle it long 
term. Seems that it is a known issue with K8S @jay-olulana 
https://github.com/kubernetes/kubernetes/issues/89657 and it has been fixed in 
1.23 by adding `ttlSecondsAfterFinished`. 
   
   There is no automated way for you to recover, but you can do it manually if 
I am right:
   
   * have k8s 1.23+
   * apply my PR to your chart
   * nuke the chart - remove it. Since you have Terraform, that should be easy 
way and redeploying it should restore it.
   * alternatively remove the affected job manually using kubectl or the like 
   * redeploy the chart with the fix
   
   Once you redeploy the chart with the PR including the 
`ttlSecondsAfterFinished` -  the finished job should get deleted automatically 
after ~5 minutes (you can also decrease the ttl before deploying it).
   
   I would appreciate @jay-olulana if you could test some scenarios involved 
and confirm that my proposed fix works for you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to