In particular to the issue you faced regarding logs not being available until completion of task
> The logs become available after the task finished i.e > when it fetch from external source like s3. I believe this is expected behaviour since the docs <https://airflow.apache.org/howto/write-logs.html#writing-logs-locally> warn > Note that logs are only sent to remote storage once a task completes > (including failure). In other words, remote logs for running tasks are > unavailable. Logs are stored in the log folder as > {dag_id}/{task_id}/{execution_date}/{try_number}.log. *Shubham Gupta* Software Engineer zomato On Tue, Jan 8, 2019 at 6:19 AM Pramiti Goel <pramitigoe...@gmail.com> wrote: > I am sharing the problem that we face very frequently. When we go a > particular dag and click the logs for a particular task_instance we end up > with following error. The logs become available after the task finished i.e > when it fetch from external source like s3. > > > *** Log file isn't local. > *** Fetching here: http:// > <ip>:9001/log/<dag-name>/etl_spark/2019-01-07T23:10:00/1.log > *** *Failed to fetch log file from worker. HTTPConnectionPool(host='<ip>', > port=9001): Max retries exceeded with url: > /log/<dag>/etl_spark/2019-01-07T23:10:00/1.log (Caused by > NewConnectionError*('<urllib3.connection.HTTPConnection object at > 0x7f8c8a6ee250>: Failed to establish a new connection: [Errno 111] > Connection refused',)) > > > Going deeper into the issue , we found the reason. > There is a service airlfow service_logs which is flask service running one > each worker. Its work is to fetch task logs present locally from any worker > to webserver when requested. So each time webserver Ui requests the > serve_log process , it creates a thread within flask. We saw were lot of > connections running . We explicttly kill the airflow server_log service > which we saw it running from 11July, 2018. > > Since there are lot of open connections, it is unable to create a new one > and we end up with the above error. > We did manually to resolve the issue by killing all the connections. But > again we see the same issue. What is permanent solution to this ? Should we > keep restarting worker or airflow service_logs service ? > > Thanks, > Pramiti >