ashb commented on a change in pull request #6377: [AIRFLOW-5589] monitor pods by labels instead of names URL: https://github.com/apache/airflow/pull/6377#discussion_r338030668
########## File path: airflow/contrib/operators/kubernetes_pod_operator.py ########## @@ -112,55 +113,60 @@ class KubernetesPodOperator(BaseOperator): # pylint: disable=too-many-instance- """ template_fields = ('cmds', 'arguments', 'env_vars', 'config_file') + @staticmethod + def create_labels_for_pod(context): + """ + Generate labels for the pod s.t. we can track it in case of Operator crash + + :param context: + :return: + """ + labels = { + 'dag_id': context['dag'].dag_id, + 'task_id': context['task'].task_id, + 'exec_date': context['ts'], + 'try_number': context['ti'].try_number, Review comment: The counter argument to this: if we _don't_ include try_number here then it could "reattach" to a previous running pod when Airflow thinks it has retired. That seems like an odd behaviour. This is all edge cases anyway, it's unlikely that we would be in this situation. What we could do is search by dag_id/task_id/exec_date (i.e. don't search by try_number) and then if we find a running pod with something other than the expected try_number we kill it? A little bit worried we might kill the wrong pod somehow there, but it _should_ be safe? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services