[
https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448978#comment-16448978
]
John Arnold commented on AIRFLOW-2367:
--------------------------------------
I believe the "top talker" query is from this query in models.py:
@provide_session
def get_task_instances(self, state=None, session=None):
"""
Returns the task instances for this dag run
"""
TI = TaskInstance
tis = session.query(TI).filter(
TI.dag_id == self.dag_id,
TI.execution_date == self.execution_date,
)
> High POSTGRES DB CPU utilization
> --------------------------------
>
> Key: AIRFLOW-2367
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2367
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: Airflow 2.0, 1.9.0
> Reporter: John Arnold
> Priority: Major
> Attachments: cpu.png, postgres.png
>
>
> We are seeing steady state 70-90% CPU utilization. It feels like a missing
> index kind of problem, as our TPS rate is really low, I'm not seeing any long
> running queries, connection counts are reasonable (low hundreds) and locks
> also look reasonable (not many exclusive / write locks)
> We shut down the webserver and it doesn't go away, so it doesn't seem to be
> in that part of the code. My guess is either the scheduler has an inefficient
> query, or the (Celery) executor code path does.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)