ofgrenudo commented on issue #57158: URL: https://github.com/apache/airflow/issues/57158#issuecomment-3444791356
hmmm so occassionally, we will have an issue where the logs will stop rendering. the resolution has always been to restart the scheduler. Im trying to find the code that causes this. We are currently using a docker deployment. I have at times been able to reproduce it when im working locally, but i always figured it was my machine that was causing some sort of state error instead of airflow having a bug. I would appreciate some direction in where to look to see if airflow can provide any logs as to whats happenning. ## Some facts that make this more confusing When this happens, if you are at the web ui, and trigger a task, it is successfully ran on the worker. But the logs are not retrieved from the job. If i log into the web server, through docker exec, I can view the logs and see my jobs are running, but they are reported as failures. in the web ui. for jobs that are supposed to re run on failure, this can be an issue because now the job is successfully executing, but is reporting as failed causing it to re run. Our solution would probably be to write some more itempotent code, but either way, airflow running into this state error is a major issue for us. again to recap, where should i begin to look to try and diagnose this issue. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
