argibbs commented on issue #34339: URL: https://github.com/apache/airflow/issues/34339#issuecomment-1718286614
> ++ some guess on my side - we recently had such issue also related to infrastructure instability. Logs showed that tasks executed successfully but updates in DB failed due to network connection problems. We could find exceptions in connectivity in the worker stdout. So question: > > * Is it possible for you to reproduce this and capture the worker logs for the time? > * Can you share these logs and potentially the scheduler logs in the same timeframe? I mean, I have been able to reliably reproduce this since upgrading to 2.7.0 - however, I have scoured the worker logs, and there's never any obvious errors. The same goes for the db logs as well. I haven't seen anything in the scheduler logs, but they're noisy. The only obvious error I've found so far is the dag processor timeout errors. Also, worth noting that I really only changed one thing: upgraded from 2.3.3->2.7.0->2.7.1; if we'd been experiencing network issues, I'd have expected that to be version agnostic, rather than manifesting only once I'd upgraded to 2.7. I'm not discounting it (I'm working on removing some config errors that are causing log errors in the scheduler, so I can better grep for fails), but it's not my primary suspect right now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
