[
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929517#comment-16929517
]
Chris Wegrzyn commented on AIRFLOW-5447:
----------------------------------------
I just upgraded a deployment from 1.10.2 to 1.10.5 (also had the same issues on
1.10.4, haven't yet tried 1.10.3) and am experiencing the same issues.
I've tracked it down to the same line. Judging by the log messages, we get this
log message:
[https://github.com/apache/airflow/blob/1.10.5/airflow/contrib/executors/kubernetes_executor.py#L762]
But we never get this log message:
[https://github.com/apache/airflow/blob/1.10.5/airflow/executors/base_executor.py#L135]
Here's a slightly sanitized log:
{{[2019-09-13 19:51:24,074] \{{scheduler_job.py:1438}} DEBUG - Heartbeating the
executor}}
{{[2019-09-13 19:51:24,074] \{{base_executor.py:124}} DEBUG - 0 running task
instances}}
{{[2019-09-13 19:51:24,074] \{{base_executor.py:125}} DEBUG - 0 in queue}}
{{[2019-09-13 19:51:24,074] \{{base_executor.py:126}} DEBUG - 32 open slots}}
{{[2019-09-13 19:51:24,075] \{{base_executor.py:135}} DEBUG - Calling the
<class 'airflow.contrib.executors.kubernetes_executor.KubernetesExecutor'> sync
method}}
{{[2019-09-13 19:51:24,083] \{{scheduler_job.py:1459}} DEBUG - Ran scheduling
loop in 0.01 seconds}}
{{[2019-09-13 19:51:24,083] \{{scheduler_job.py:1462}} DEBUG - Sleeping for
1.00 seconds}}
{{[2019-09-13 19:51:24,087] \{{settings.py:54}} INFO - Configured default
timezone <Timezone [UTC]>}}
{{[2019-09-13 19:51:24,093] \{{settings.py:327}} DEBUG - Failed to import
airflow_local_settings.}}
{{Traceback (most recent call last):}}
{{ File
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/settings.py",
line 315, in import_local_settings}}
{{ import airflow_local_settings}}
{{ModuleNotFoundError: No module named 'airflow_local_settings'}}
{{[2019-09-13 19:51:24,094] \{{logging_config.py:59}} DEBUG - Unable to load
custom logging, using default config instead}}
{{[2019-09-13 19:51:24,109] \{{settings.py:170}} DEBUG - Setting up DB
connection pool (PID 49)}}
{{[2019-09-13 19:51:24,110] \{{settings.py:213}} INFO -
settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10,
pool_recycle=1800, pid=49}}
{{[2019-09-13 19:51:24,295] \{{settings.py:238}} DEBUG - Disposing DB
connection pool (PID 55)}}
{{[2019-09-13 19:51:24,380] \{{settings.py:238}} DEBUG - Disposing DB
connection pool (PID 59)}}
{{[2019-09-13 19:51:25,084] \{{scheduler_job.py:1474}} DEBUG - Sleeping for
0.99 seconds to prevent excessive logging}}
{{[2019-09-13 19:51:25,117] \{{scheduler_job.py:257}} DEBUG - Waiting for
<Process(DagFileProcessor0-Process, stopped)>}}
{{[2019-09-13 19:51:25,118] \{{scheduler_job.py:257}} DEBUG - Waiting for
<Process(DagFileProcessor1-Process, stopped)>}}
{{[2019-09-13 19:51:25,226] \{{settings.py:238}} DEBUG - Disposing DB
connection pool (PID 69)}}
{{[2019-09-13 19:51:25,278] \{{settings.py:238}} DEBUG - Disposing DB
connection pool (PID 73)}}
{{[2019-09-13 19:51:26,076] \{{scheduler_job.py:1390}} DEBUG - Starting
Loop...}}
{{[2019-09-13 19:51:26,076] \{{scheduler_job.py:1401}} DEBUG - Harvesting DAG
parsing results}}
{{[2019-09-13 19:51:26,076] \{{dag_processing.py:637}} DEBUG - Received message
of type DagParsingStat}}
{{[2019-09-13 19:51:26,077] \{{dag_processing.py:637}} DEBUG - Received message
of type SimpleDag}}
{{[2019-09-13 19:51:26,077] \{{dag_processing.py:637}} DEBUG - Received message
of type DagParsingStat}}
{{[2019-09-13 19:51:26,078] \{{dag_processing.py:637}} DEBUG - Received message
of type DagParsingStat}}
{{[2019-09-13 19:51:26,078] \{{scheduler_job.py:1403}} DEBUG - Harvested 1
SimpleDAGs}}
{{[2019-09-13 19:51:26,109] \{{scheduler_job.py:921}} INFO - 1 tasks up for
execution:}}
{{ <TaskInstance: parse_log.xyz_parse_log_2019-09-13 2019-09-12 00:00:00+00:00
[scheduled]>}}
{{[2019-09-13 19:51:26,122] \{{scheduler_job.py:953}} INFO - Figuring out tasks
to run in Pool(name=default_pool) with 128 open slots and 1 task instances
ready to be queued}}
{{[2019-09-13 19:51:26,123] \{{scheduler_job.py:981}} INFO - DAG parse_log has
0/16 running and queued tasks}}
{{[2019-09-13 19:51:26,132] \{{scheduler_job.py:257}} DEBUG - Waiting for
<Process(DagFileProcessor2-Process, stopped)>}}
{{[2019-09-13 19:51:26,133] \{{scheduler_job.py:257}} DEBUG - Waiting for
<Process(DagFileProcessor3-Process, stopped)>}}
{{[2019-09-13 19:51:26,142] \{{scheduler_job.py:1031}} INFO - Setting the
following tasks to queued state:}}
{{ <TaskInstance: parse_log.xyz_parse_log_2019-09-13 2019-09-12 00:00:00+00:00
[scheduled]>}}
{{[2019-09-13 19:51:26,157] \{{scheduler_job.py:1107}} INFO - Setting the
following 1 tasks to queued state:}}
{{ <TaskInstance: parse_log.xyz_parse_log_2019-09-13 2019-09-12 00:00:00+00:00
[queued]>}}
{{[2019-09-13 19:51:26,157] \{{scheduler_job.py:1143}} INFO - Sending
('parse_log', 'xyz_parse_log_2019-09-13', datetime.datetime(2019, 9, 12, 0, 0,
tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1) to executor with priority
2 and queue default}}
{{[2019-09-13 19:51:26,158] \{{base_executor.py:59}} INFO - Adding to queue:
['airflow', 'run', 'parse_log', 'xyz_parse_log_2019-09-13',
'2019-09-12T00:00:00+00:00', '--local', '--pool', 'default_pool', '-sd',
'/usr/local/airflow/dags/xyz.py']}}
{{[2019-09-13 19:51:26,158] \{{scheduler_job.py:1438}} DEBUG - Heartbeating the
executor}}
{{[2019-09-13 19:51:26,159] \{{base_executor.py:124}} DEBUG - 0 running task
instances}}
{{[2019-09-13 19:51:26,159] \{{base_executor.py:125}} DEBUG - 1 in queue}}
{{[2019-09-13 19:51:26,167] \{{base_executor.py:126}} DEBUG - 32 open slots}}
{{[2019-09-13 19:51:26,167] \{{kubernetes_executor.py:764}} INFO - Add task
('parse_log', 'xyz_parse_log_2019-09-13', datetime.datetime(2019, 9, 12, 0, 0,
tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1) with command ['airflow',
'run', 'parse_log', 'xyz_parse_log_2019-09-13', '2019-09-12T00:00:00+00:00',
'--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/xyz.py']
with executor_config {}}}
{{[2019-09-13 19:51:26,248] \{{settings.py:238}} DEBUG - Disposing DB
connection pool (PID 83)}}
{{[2019-09-13 19:51:26,314] \{{settings.py:238}} DEBUG - Disposing DB
connection pool (PID 85)}}
{{[2019-09-13 19:51:27,149] \{{scheduler_job.py:257}} DEBUG - Waiting for
<Process(DagFileProcessor4-Process, stopped)>}}
{{[2019-09-13 19:51:27,150] \{{scheduler_job.py:257}} DEBUG - Waiting for
<Process(DagFileProcessor5-Process, stopped)>}}
{{[2019-09-13 19:51:27,289] \{{settings.py:238}} DEBUG - Disposing DB
connection pool (PID 101)}}
After that, the messages about "Disposing DB connection pool" and "Waiting for
<Process(DagFileProcessor" repeat endlessly.
This is running on an Amazon EKS cluster. No mem/CPU limits on the pod.
> KubernetesExecutor hangs on task queueing
> -----------------------------------------
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
> Issue Type: Bug
> Components: executor-kubernetes
> Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
> Reporter: Henry Cohen
> Assignee: Daniel Imberman
> Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the
> KubernetesExecutor, with the webserver and scheduler running in the
> kubernetes cluster, tasks are scheduled, but when added to the task queue,
> the executor process hangs indefinitely. Based on log messages, it appears to
> be stuck at this line
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761
--
This message was sent by Atlassian Jira
(v8.3.2#803003)