dvd9604 opened a new issue #15767:
URL: https://github.com/apache/airflow/issues/15767


   
   **Apache Airflow version**: 2.0.0
   
   **Environment**:
   - **OS** (e.g. from /etc/os-release): RHEL7
   - **Python**: 3.8.6
   - **Executor**: CeleryExecutor
   - **Workers**: 4 worker nodes
   - **logs dir**: /opt/airflow/logs/
   
   
   **What happened**:
   
   Scenario: Assume there are four workers labeled `host_a`, `host_b`, 
`host_c`, and `host_d`. The webserver and scheduler are running on another host 
`host_z`. When a task is ran for the first time it executes on `host_a` and the 
log file is created locally on that host. If the same task is cleared and ran 
again, it may execute on another host such as `host_b`. When navigating back to 
`1.log` after looking at `2.log` the webserver replaces hostname from `host_a` 
to `host_b` leading to the log file not being found.
   
   1. Task runs for first time `1.log` is created on `host_a`
   ```
   *** Log file does not exist: 
/opt/airflow/logs/dag_foo/task_bar/2021-05-10T15:54:44.662671+00:00/1.log
   *** Fetching from: 
http://hosta.domain.com:8793/log/dag_foo/task_bar/2021-05-10T15:54:44.662671+00:00/1.log
   ....
   ```
   
   2. Manually clear and run task again `2.log` created on `host_b`
   
   
   ```
   *** Log file does not exist: 
/opt/airflow/logs/dag_foo/task_bar/2021-05-10T15:54:44.662671+00:00/2.log
   *** Fetching from: 
http://hostb.domain.com:8793/log/dag_foo/task_bar/2021-05-10T15:54:44.662671+00:00/2.log
   .... rest of log
   ```
   
   
   3. Navigate back to `1.log`. Host has been replaced from `host_a` to 
`host_b` **hostb.domain.com:8793**
   
   ```
   *** Log file does not exist: 
/opt/airflow/logs/dag_foo/task_bar/2021-05-10T15:54:44.662671+00:00/1.log
   *** Fetching from: 
http://hostb.domain.com:8793/log/dag_foo/task_bar/2021-05-10T15:54:44.662671+00:00/1.log
   *** Failed to fetch log file from worker. 404 Client Error: NOT FOUND for 
url: 
http://hosta.domain.com:8793/log/dag_foo/task_bar/2021-05-10T15:54:44.662671+00:00/1.log
   ```
   
   
   **What you expected to happen**:
   Webserver should remember where a task ran.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to