In particular to the issue you faced regarding logs not being available
until completion of task

> The logs become available after the task finished i.e
> when it fetch from external source like s3.


I believe this is expected behaviour since the docs
<https://airflow.apache.org/howto/write-logs.html#writing-logs-locally> warn

> Note that logs are only sent to remote storage once a task completes
> (including failure). In other words, remote logs for running tasks are
> unavailable. Logs are stored in the log folder as
> {dag_id}/{task_id}/{execution_date}/{try_number}.log.


*Shubham Gupta*
Software Engineer
 zomato


On Tue, Jan 8, 2019 at 6:19 AM Pramiti Goel <pramitigoe...@gmail.com> wrote:

> I am sharing the problem that we face very frequently. When we go a
> particular dag and click the logs for a particular task_instance we end up
> with following error. The logs become available after the task finished i.e
> when it fetch from external source like s3.
>
>
> *** Log file isn't local.
> *** Fetching here: http://
> <ip>:9001/log/<dag-name>/etl_spark/2019-01-07T23:10:00/1.log
> *** *Failed to fetch log file from worker. HTTPConnectionPool(host='<ip>',
> port=9001): Max retries exceeded with url:
> /log/<dag>/etl_spark/2019-01-07T23:10:00/1.log (Caused by
> NewConnectionError*('<urllib3.connection.HTTPConnection object at
> 0x7f8c8a6ee250>: Failed to establish a new connection: [Errno 111]
> Connection refused',))
>
>
> Going deeper into the issue , we found the reason.
> There is a service airlfow service_logs which is flask service running one
> each worker. Its work is to fetch task logs present locally from any worker
> to webserver when requested. So each time webserver Ui requests the
> serve_log process , it creates a thread within flask. We saw were lot of
> connections running . We explicttly kill the airflow server_log service
> which we saw it running from 11July, 2018.
>
> Since there are lot of open connections, it is unable to create a new one
> and we end up with the above error.
> We did manually to resolve the issue by killing all the connections. But
> again we see the same issue. What is permanent solution to this ? Should we
> keep restarting worker or airflow service_logs service ?
>
> Thanks,
> Pramiti
>

Reply via email to