PaulW edited a comment on issue #4636: [AIRFLOW-3737] Kubernetes executor 
cannot handle long dag/task names
URL: https://github.com/apache/airflow/pull/4636#issuecomment-466396697
 
 
   So the reason behind hashing vs slug is that within kubernetes, you can't 
query pods using annotations, and as such need to rely on labels.  Within the 
`clear_not_launched_queued_tasks` function within `kubernetes_executor.py` a 
query is sent to kubernetes:
   
   
https://github.com/apache/airflow/blob/9a2d998f57b48bcfe07f16a0563293a13141b60e/airflow/contrib/executors/kubernetes_executor.py#L563
   
   This returns a set result, which would previously return pods matching the 
`dag_id` and `task_id` as per the running task within airflow.  However, if (as 
is the case) this string is above 63 chars, no pods will be returned (as they 
simply wouldn't have existed to begin with due to this constraint).
   
   Simply truncating the values would lead to issues in regards to subdags or 
long named dags/tasks, as you could return multiple pods matching the truncated 
name (especially in the case of subdag execution) and as such would cause 
further issues and require more calls to kubernetes to then process this list 
of multiple pods.
   
   If you truncate & slugify the names, you can still hit this condition where 
multiple pods can be returned.
   
   Hashing the entire `dag_id` and `task_id` as labels, and storing them in 
their entirety as annotations, allows the query to kubernetes to return just 
the one specific pod relating to the dag/task at hand.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to