Kamil Bregula created AIRFLOW-6532:
--------------------------------------

             Summary: Fetch celery states using batch method instead Pool
                 Key: AIRFLOW-6532
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6532
             Project: Apache Airflow
          Issue Type: Improvement
          Components: executors
    Affects Versions: 1.10.7
            Reporter: Kamil Bregula


One aspect that is worth checking is how much time Celery takes to receive task 
statuses.
https://github.com/apache/airflow/blob/77099b876814ec0008fd8da18f35de70deccbe03/airflow/executors/celery_executor.py#L246-L259
My clients use MySQL as the result backend, so celery sends 100 queries to the 
database for 100 tasks.
https://github.com/celery/celery/blob/77099b876814ec0008fd8da18f35de70deccbe03/airflow/backends/database/__init__.py#L149-L164
In my opinion, this can speed up if we replace our code by calling the method 
from Celery - celery.backends.base:BaseKeyValueStoreBackend.get_many
https://github.com/celery/celery/blob/77099b876814ec0008fd8da18f35de70deccbe03/celery/backends/base.py#L711-L747
Unfortunately, this method works only with Redis, so we will have to extend the 
mget / get_many method in DatabaseBackend class to work properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to