stoiandl commented on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-780603101


   It is not a liveness probe issue. For some reason the server stops the 
heartbeats. I basically have heartbeats every 5 secs in the logs.
   Started scheduler pod at 14:26:21.
   ```
   [2021-02-17 14:26:21,095] {cli_action_loggers.py:66} DEBUG - Calling 
callbacks: [<function default_action_log at 0x7f19ad429670>]
     ____________       _____________
    ____    |__( )_________  __/__  /________      __
   ____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
   ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
    _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
   [2021-02-17 14:26:21,615] {executor_loader.py:82} DEBUG - Loading core 
executor: KubernetesExecutor
   [2021-02-17 14:26:21,682] {scheduler_job.py:1241} INFO - Starting the 
scheduler
   [2021-02-17 14:26:21,682] {scheduler_job.py:1246} INFO - Processing each 
file at most -1 times
   [2021-02-17 14:26:21,683] {kubernetes_executor.py:491} INFO - Start 
Kubernetes executor
   [2021-02-17 14:26:21,683] {kubernetes_executor.py:495} DEBUG - Start with 
scheduler_job_id: 236
   [2021-02-17 14:26:21,685] {kubernetes_executor.py:235} DEBUG - Creating 
Kubernetes executor
   [2021-02-17 14:26:21,685] {kubernetes_executor.py:240} DEBUG - Kubernetes 
using namespace airflow
   [2021-02-17 14:26:21,699] {kubernetes_executor.py:458} DEBUG - Clearing 
tasks that have not been launched
   [2021-02-17 14:26:21,704] {kubernetes_executor.py:126} INFO - Event: and now 
my watch begins starting at resource_version: 0
   [2021-02-17 14:26:21,718] {kubernetes_executor.py:462} INFO - When executor 
started up, found 0 queued task instances
   [2021-02-17 14:26:21,724] {dag_processing.py:250} INFO - Launched 
DagFileProcessorManager with pid: 34
   [2021-02-17 14:26:21,725] {scheduler_job.py:1751} INFO - Resetting orphaned 
tasks for active dag runs
   ```
   
   I can see the heartbeat running every 5 sec until 14:31:22. Below is the 
last 2 heartbeats before the actual scheduler hangs.
   ```
   [2021-02-17 14:31:17,266] {base_job.py:219} DEBUG - [heartbeat]
   [2021-02-17 14:31:17,273] {scheduler_job.py:1390} DEBUG - Next timed event 
is in 4.999986
   [2021-02-17 14:31:17,274] {scheduler_job.py:1392} DEBUG - Ran scheduling 
loop in 0.08 seconds
   [2021-02-17 14:31:17,798] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3408)
   [2021-02-17 14:31:17,805] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor481-Process' pid=3408 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:18,305] {scheduler_job.py:933} DEBUG - No tasks to 
consider for execution.
   [2021-02-17 14:31:18,309] {base_executor.py:147} DEBUG - 0 running task 
instances
   [2021-02-17 14:31:18,309] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:18,309] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:18,310] {base_executor.py:158} DEBUG - Calling the <class 
'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:18,310] {kubernetes_executor.py:337} DEBUG - Syncing 
KubernetesExecutor
   [2021-02-17 14:31:18,310] {kubernetes_executor.py:263} DEBUG - 
KubeJobWatcher alive, continuing
   [2021-02-17 14:31:18,311] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:18,311] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:18,321] {scheduler_job.py:1390} DEBUG - Next timed event 
is in 3.951973
   [2021-02-17 14:31:18,322] {scheduler_job.py:1392} DEBUG - Ran scheduling 
loop in 0.05 seconds
   [2021-02-17 14:31:18,435] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3415)
   [2021-02-17 14:31:18,444] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor482-Process' pid=3415 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:19,104] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3422)
   [2021-02-17 14:31:19,113] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor483-Process' pid=3422 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:19,351] {scheduler_job.py:933} DEBUG - No tasks to 
consider for execution.
   [2021-02-17 14:31:19,354] {base_executor.py:147} DEBUG - 0 running task 
instances
   [2021-02-17 14:31:19,354] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:19,354] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:19,355] {base_executor.py:158} DEBUG - Calling the <class 
'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:19,355] {kubernetes_executor.py:337} DEBUG - Syncing 
KubernetesExecutor
   [2021-02-17 14:31:19,355] {kubernetes_executor.py:263} DEBUG - 
KubeJobWatcher alive, continuing
   [2021-02-17 14:31:19,356] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:19,356] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:19,367] {scheduler_job.py:1390} DEBUG - Next timed event 
is in 2.906800
   [2021-02-17 14:31:19,367] {scheduler_job.py:1392} DEBUG - Ran scheduling 
loop in 0.04 seconds
   [2021-02-17 14:31:19,702] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3429)
   [2021-02-17 14:31:19,710] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor484-Process' pid=3429 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:20,340] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3436)
   [2021-02-17 14:31:20,348] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor485-Process' pid=3436 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:20,408] {scheduler_job.py:933} DEBUG - No tasks to 
consider for execution.
   [2021-02-17 14:31:20,412] {base_executor.py:147} DEBUG - 0 running task 
instances
   [2021-02-17 14:31:20,412] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:20,413] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:20,413] {base_executor.py:158} DEBUG - Calling the <class 
'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:20,413] {kubernetes_executor.py:337} DEBUG - Syncing 
KubernetesExecutor
   [2021-02-17 14:31:20,413] {kubernetes_executor.py:263} DEBUG - 
KubeJobWatcher alive, continuing
   [2021-02-17 14:31:20,414] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:20,415] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:20,415] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:20,425] {scheduler_job.py:1390} DEBUG - Next timed event 
is in 1.848308
   [2021-02-17 14:31:20,425] {scheduler_job.py:1392} DEBUG - Ran scheduling 
loop in 0.05 seconds
   [2021-02-17 14:31:20,975] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3443)
   [2021-02-17 14:31:20,983] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor486-Process' pid=3443 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:21,462] {scheduler_job.py:933} DEBUG - No tasks to 
consider for execution.
   [2021-02-17 14:31:21,471] {base_executor.py:147} DEBUG - 0 running task 
instances
   [2021-02-17 14:31:21,471] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:21,471] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:21,472] {base_executor.py:158} DEBUG - Calling the <class 
'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:21,472] {kubernetes_executor.py:337} DEBUG - Syncing 
KubernetesExecutor
   [2021-02-17 14:31:21,472] {kubernetes_executor.py:263} DEBUG - 
KubeJobWatcher alive, continuing
   [2021-02-17 14:31:21,473] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:21,474] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:21,484] {scheduler_job.py:1390} DEBUG - Next timed event 
is in 0.789546
   [2021-02-17 14:31:21,484] {scheduler_job.py:1392} DEBUG - Ran scheduling 
loop in 0.06 seconds
   [2021-02-17 14:31:21,569] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3450)
   [2021-02-17 14:31:21,579] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor487-Process' pid=3450 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:21,999] {serialized_dag.py:193} DEBUG - Deleting 
Serialized DAGs (for which DAG files are deleted) from serialized_dag table 
   [2021-02-17 14:31:22,004] {dag.py:2199} DEBUG - Deactivating DAGs (for which 
DAG files are deleted) from dag table 
   [2021-02-17 14:31:22,010] {dagcode.py:135} DEBUG - Deleting code from 
dag_code table 
   [2021-02-17 14:31:22,174] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3457)
   [2021-02-17 14:31:22,183] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor488-Process' pid=3457 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:22,303] {scheduler_job.py:933} DEBUG - No tasks to 
consider for execution.
   [2021-02-17 14:31:22,307] {base_executor.py:147} DEBUG - 0 running task 
instances
   [2021-02-17 14:31:22,307] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:22,308] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:22,308] {base_executor.py:158} DEBUG - Calling the <class 
'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:22,308] {kubernetes_executor.py:337} DEBUG - Syncing 
KubernetesExecutor
   [2021-02-17 14:31:22,308] {kubernetes_executor.py:263} DEBUG - 
KubeJobWatcher alive, continuing
   [2021-02-17 14:31:22,309] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:22,310] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:22,310] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:22,339] {base_job.py:219} DEBUG - [heartbeat]
   [2021-02-17 14:31:22,348] {scheduler_job.py:1390} DEBUG - Next timed event 
is in 0.502616
   [2021-02-17 14:31:22,348] {scheduler_job.py:1392} DEBUG - Ran scheduling 
loop in 0.07 seconds
   [2021-02-17 14:31:22,868] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3464)
   [2021-02-17 14:31:22,880] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor489-Process' pid=3464 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:22,886] {scheduler_job.py:933} DEBUG - No tasks to 
consider for execution.
   [2021-02-17 14:31:22,894] {base_executor.py:147} DEBUG - 0 running task 
instances
   [2021-02-17 14:31:22,894] {base_executor.py:148} DEBUG - 0 in queue
   [2021-02-17 14:31:22,894] {base_executor.py:149} DEBUG - 32 open slots
   [2021-02-17 14:31:22,895] {base_executor.py:158} DEBUG - Calling the <class 
'airflow.executors.kubernetes_executor.KubernetesExecutor'> sync method
   [2021-02-17 14:31:22,895] {kubernetes_executor.py:337} DEBUG - Syncing 
KubernetesExecutor
   [2021-02-17 14:31:22,896] {kubernetes_executor.py:263} DEBUG - 
KubeJobWatcher alive, continuing
   [2021-02-17 14:31:22,897] {dag_processing.py:383} DEBUG - Received message 
of type DagParsingStat
   [2021-02-17 14:31:22,907] {scheduler_job.py:1751} INFO - Resetting orphaned 
tasks for active dag runs
   [2021-02-17 14:31:23,487] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3471)
   [2021-02-17 14:31:23,494] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor490-Process' pid=3471 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:24,111] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3478)
   [2021-02-17 14:31:24,123] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor491-Process' pid=3478 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:24,747] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3485)
   [2021-02-17 14:31:24,754] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor492-Process' pid=3485 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:25,376] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3492)
   [2021-02-17 14:31:25,385] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor493-Process' pid=3492 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:26,022] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3499)
   [2021-02-17 14:31:26,030] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor494-Process' pid=3499 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:26,608] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3506)
   [2021-02-17 14:31:26,617] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor495-Process' pid=3506 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:27,241] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3513)
   [2021-02-17 14:31:27,249] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor496-Process' pid=3513 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:27,852] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3520)
   [2021-02-17 14:31:27,861] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor497-Process' pid=3520 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:28,454] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3527)
   [2021-02-17 14:31:28,472] {scheduler_job.py:309} DEBUG - Waiting for 
<ForkProcess name='DagFileProcessor498-Process' pid=3527 parent=34 stopped 
exitcode=0>
   [2021-02-17 14:31:29,086] {settings.py:290} DEBUG - Disposing DB connection 
pool (PID 3534)
   ... many of these disposing db connections
   ```
   
   In the frontend I get:
   
   _The scheduler does not appear to be running. Last heartbeat was received 11 
minutes ago.
   The DAGs list may not update, and new tasks will not be scheduled._


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to