johnny0120 opened a new issue #18080:
URL: https://github.com/apache/airflow/issues/18080


   ### Apache Airflow version
   
   2.1.3 (latest released)
   
   ### Operating System
   
   Debian GNU/Linux 10 (buster)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-elasticsearch==2.0.2
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   1 master node with webserver, scheduler, flower and etc.
   4 worker node, each with a celery worker process;
   filebeat 7.6.2 as sidecar reading logs with pattern 
/opt/airflow/logs/**/*.log from the official docker image container
   
   ### What happened
   
   I have set up filebeat and elasticsearch for remote logging and it works 
well with normal operator or tasks.
   
   But when I tried ExternalTaskSensor with `mode='reschedule'` and 
`poke_interval=300` option, the log view from the webpage showed something like 
the following.
   
   ```txt
   ...
   [2021-09-08 15:10:03,392] {taskinstance.py:1094} INFO - 
   [2021-09-08 15:20:07,030] {taskinstance.py:1094} INFO - 
   [2021-09-08 15:15:04,941] {taskinstance.py:1094} INFO - 
   
--------------------------------------------------------------------------------
   
--------------------------------------------------------------------------------
   
--------------------------------------------------------------------------------
   [2021-09-08 15:10:03,392] {taskinstance.py:1095} INFO - Starting attempt 1 
of 4
   [2021-09-08 15:20:07,030] {taskinstance.py:1095} INFO - Starting attempt 1 
of 4
   [2021-09-08 15:15:04,941] {taskinstance.py:1095} INFO - Starting attempt 1 
of 4
   [2021-09-08 15:20:07,030] {taskinstance.py:1096} INFO - 
   [2021-09-08 15:10:03,392] {taskinstance.py:1096} INFO - 
   [2021-09-08 15:15:04,941] {taskinstance.py:1096} INFO - 
   
--------------------------------------------------------------------------------
   
--------------------------------------------------------------------------------
   
--------------------------------------------------------------------------------
   [2021-09-08 15:10:03,402] {taskinstance.py:1114} INFO - Executing 
<Task(ExternalTaskSensor): 
   [2021-09-08 15:20:07,041] {taskinstance.py:1114} INFO - Executing 
<Task(ExternalTaskSensor): 
   [2021-09-08 15:15:04,950] {taskinstance.py:1114} INFO - Executing 
<Task(ExternalTaskSensor):
   ...
   ```
   
   Apparently, the logs from different runs on different worker node, got 
pulled together, because they share the same try_number: 1, which means the 
`log_id` used to retrieve logs from remote storage are the same.
   
   > 
   
   ### What you expected to happen
   
   Option 1: Update or increase `try_number` after a task has been marked as 
`up_for_reschedule`, so that the `log_id` would be different by runs. This will 
probably solve some relevant issues from reschedule/try_number and log_id
   
   Option 2: Fix the sensor mode, and do not use `up_for_reschedule`, maybe 
`up_for_retry`? But this would change the behaviors of all the sensor operator, 
seems like a big change
   
   Option 3: Update the log location and its log_id, maybe include a 
`reschedule_number` besides the `try_number`,  but this will need to change the 
log_id template, also seems like a big change
   
   ### How to reproduce
   
   Essential components
   
   1. filebeat and working elasticsearch cluster
   2. multiple worker nodes so that the logs with the same try_number may be 
created on different nodes
   3. Use ExternalTaskSensor with `mode='reschedule'` to sense an upperstream 
task that needs multiple reschedules to succeed
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to