RNHTTR commented on issue #14163:
URL: https://github.com/apache/airflow/issues/14163#issuecomment-794914976


   I think it _probably_ would be sufficient to just cast the map object to a 
list like above, but I'm not sure how that would affect the performance of 
`_get_many_from_kv_backend` or `_get_many_from_db_backend` which are called in 
lieu of `_get_many_using_multiprocessing` if app.backend is an instance of 
DatabaseBackend or BaseKeyValueStoreBackend, respectively (see 
[get_many](https://github.com/apache/airflow/blob/90ab60bba877c65cb93871b97db13a179820d28b/airflow/executors/celery_executor.py#L546-L556)).
 In these methods, the map is converted to a set via 
[_tasks_list_to_task_ids](https://github.com/apache/airflow/blob/90ab60bba877c65cb93871b97db13a179820d28b/airflow/executors/celery_executor.py#L529-L530),
 so it doesn't really make sense to convert the map to a list only for it to 
get converted to a set immediately after.
   
   So, I think it's best to convert the map to a list or set in the method 
[_get_many_using_multiprocessing](https://github.com/apache/airflow/blob/90ab60bba877c65cb93871b97db13a179820d28b/airflow/executors/celery_executor.py#L594-L595)
 before it attempts to get the length of the map, and thus would instead get 
the length of a list or set.
   
   I think this bug snuck by the [test 
case](https://github.com/apache/airflow/blob/master/tests/executors/test_celery_executor.py#L353-L377)
 because the test assumes app.backend is set to `DatabaseBackend`, so 
`_get_many_using_multiprocessing` is never called in the test case. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to