[GitHub] PaulW commented on issue #4636: [AIRFLOW-3737] Kubernetes executor cannot handle long dag/task names

GitBox Fri, 22 Feb 2019 05:32:31 -0800

PaulW commented on issue #4636: [AIRFLOW-3737] Kubernetes executor cannot 
handle long dag/task names
URL: https://github.com/apache/airflow/pull/4636#issuecomment-466396697
 
 
   So the reason behind hashing vs slug is that within kubernetes, you can't 
query pods using annotations, and as such need to rely on labels.  Within the 
`clear_not_launched_queued_tasks` function within `kubernetes_executor.py` a 
query is sent to kubernetes:
   
   
https://github.com/apache/airflow/blob/9a2d998f57b48bcfe07f16a0563293a13141b60e/airflow/contrib/executors/kubernetes_executor.py#L563
   
   This returns a set result, which would previously return pods matching the 
`dag_id` and `task_id` as per the running task within airflow.  However, if (as 
is the case) this string is above 63 chars, no pods will be returned, and if 
something is running, it will be unknown to airflow due to this.
   
   Simply truncating the values would lead to issues in regards to subdags or 
long named dags/tasks, as you could return multiple pods matching the truncated 
name, and could lead to further issues.
   
   If you truncate & slugify the names, you can still hit this condition.
   
   Hashing the entire `dag_id` and `task_id` as labels, but then storing them 
as annotation which are then returned when a hash is matched & a single pod 
instance is returned overcomes this in its entirety.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] PaulW commented on issue #4636: [AIRFLOW-3737] Kubernetes executor cannot handle long dag/task names

Reply via email to