Re: [PR] Remove state sync during celery task processing [airflow]

via GitHub Wed, 04 Sep 2024 14:42:08 -0700


Kytha commented on PR #41870:
URL: https://github.com/apache/airflow/pull/41870#issuecomment-2330190525


   >here is also a possible pool configuration, so celery backend can use 
pooling. As you mentioned NullPool is used only when process is not forked. 
But.... what does it mean that process is not forked and is the case in case of 
task submitssion and checking result?
   
   Yeah, so unfortunately celery will override any pooling config provided here 
if the process is not forked. Meaning that the python process needs to have 
made a call to os.fork() prior to calling get_engine.  
([this](https://github.com/celery/celery/blob/main/celery/backends/database/session.py#L25)
 is the celery callback that will execute once process is forked). In the case 
of the scheduler, it is not a forked process. For workers, [this would be the 
case](https://github.com/apache/airflow/blob/28e7213a9fab7d34e9f13b0f50bcc9cf8b80158c/airflow/providers/celery/executors/celery_executor_utils.py#L135)
 (for default airflow config). This means for the scheduler, celery is 
[creating a new engine every time get_engine is 
called](https://github.com/celery/celery/blob/main/celery/backends/database/session.py#L53).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Remove state sync during celery task processing [airflow]

Reply via email to