potiuk commented on PR #59604: URL: https://github.com/apache/airflow/pull/59604#issuecomment-3716169724
This is pretty cool "reliability" feature. I think that should also be something that we should implement in a number of other places, because it can provide resilience to transient issues. But I thinkg it needs someone who has deeper understanding of deps handling so I will refrain with approving it for some time (though it's tempting). One thing to add though - I think we should have some better way of signalling that those issues are happening - metrics for example, or maybe even a warning in the UI if it happens displayed as dismissable notification? While i think it's cool we handle this on our own, it might hide some systemic issues that deployment manager should handle, so while we should let it self-recover, we should also notify about those issues happening pretty aggreesively. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
