rafidka opened a new issue #13026:
URL: https://github.com/apache/airflow/issues/13026


   **Apache Airflow version**: 1.10.12
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl 
version`): N/A
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: Amazon `r5.4xlarge` EC2 
instance.
   - **OS** (e.g. from /etc/os-release): Amazon Linux 2
   - **Kernel** (e.g. `uname -a`): 5.4.58-37.125.amzn2int.x86_64 (via `uname 
-r`)
   - **Install tools**: N/A
   - **Others**: N/A
   
   **What happened**:
   
   I am experiencing some strange issue with scheduler heartbeat. If I 
[configure 
it](https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#scheduler-heartbeat-sec)
 to 5 seconds, somehow the heartbeat metric is received every 6 seconds. I 
tried different values and what I noticed is that with values of 5 or less, the 
heartbeat is received one second later than expected. But with 6 or more, the 
heartbeat is received at the expected time (though for 2 and 6 I experienced an 
even stranger fluctuating behaviour). Below is a table of the values I tried 
and the frequencies I received:
   
   scheduler_heartbeat_sec value|frequency of scheduler_heartbeat metric
   -|-
   0|2
   1|2
   2|Fluctuates between 2 and 4
   3|4
   5|Fluctuates between 6 and 8
   6|6
   10|10
   30|30
   
   **What you expected to happen**:
   
   I expect the `airflow.scheduler_heartbeat` metric to be received at the same 
frequency specified by the `scheduler_heartbeat_sec` configuration.
   
   **How to reproduce it**:
   
   - Install Airflow locally via pip
   - Enable StatsD metrics via the airflow.cfg configurations:
   ```
   # Statsd (https://github.com/etsy/statsd) integration settings
   statsd_on = True
   statsd_host = localhost
   statsd_port = 8125
   statsd_prefix = airflow
   ```
   - Run Airflow webserver and scheduler and enable some DAGs.
   - Execute the following Python code and watch the timestamp of the received 
metrics:
   ```python
   import socket
   from datetime import datetime
   
   UDP_IP = "127.0.0.1"
   UDP_PORT = 8125
   
   sock = socket.socket(socket.AF_INET, # Internet
                        socket.SOCK_DGRAM) # UDP
   sock.bind((UDP_IP, UDP_PORT))
   
   prev_iter_time = None
   while True:
       data, addr = sock.recvfrom(1024) # buffer size is 1024 bytes
       message = str(data)
       if 'airflow.scheduler_heartbeat' in message:
           this_iter_time = datetime.now()
           diff = (this_iter_time - prev_iter_time).total_seconds() if 
prev_iter_time is not None else 0
           print(f"{this_iter_time} [{diff} seconds] received message: 
{message}.")
           prev_iter_time = this_iter_time
   ```
   
   This is an example output when I set `scheduler_heartbeat_sec` to 5:
   
   ```
   2020-12-11 22:18:34.747872 [0 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:18:40.760118 [6.012246 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:18:46.765760 [6.005642 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:18:52.771790 [6.00603 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:18:58.778018 [6.006228 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:19:04.784721 [6.006703 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:19:10.789910 [6.005189 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   ```
   
   And this is another example when I set `scheduler_heartbeat_sec` to 6:
   
   ```
   2020-12-11 22:19:56.306896 [0 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:20:02.317303 [6.010407 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:20:10.324169 [8.006866 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:20:18.331488 [8.007319 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:20:24.338565 [6.007077 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:20:30.345710 [6.007145 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:20:38.347814 [8.002104 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:20:44.358943 [6.011129 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:20:52.368052 [8.009109 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   ```
   
   Notice that this time it fluctuates between 6 and 8 for reason.
   
   Now, setting `scheduler_heartbeat_sec` to 10, here is a much better stable 
output with the expected frequency:
   
   ```
   python3 listen_to_statsd.py
   2020-12-11 22:22:39.028670 [0 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:22:49.041901 [10.013231 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:22:59.052456 [10.010555 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:23:09.062201 [10.009745 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:23:19.071650 [10.009449 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:23:29.084219 [10.012569 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:23:39.093269 [10.00905 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:23:49.106148 [10.012879 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:23:59.113249 [10.007101 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:24:09.124079 [10.01083 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:24:19.135419 [10.01134 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:24:29.143558 [10.008139 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:24:39.154385 [10.010827 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   2020-12-11 22:24:49.164710 [10.010325 seconds] received message: 
b'airflow.scheduler_heartbeat:1|c'.
   ```
   
   **Anything else we need to know**:
   
   Yes, you are awesome, but you might already know this.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to