danccooper commented on a change in pull request #6377: [AIRFLOW-5589] monitor
pods by labels instead of names
URL: https://github.com/apache/airflow/pull/6377#discussion_r338494904
##########
File path: airflow/contrib/operators/kubernetes_pod_operator.py
##########
@@ -112,55 +113,60 @@ class KubernetesPodOperator(BaseOperator): # pylint:
disable=too-many-instance-
"""
template_fields = ('cmds', 'arguments', 'env_vars', 'config_file')
+ @staticmethod
+ def create_labels_for_pod(context):
+ """
+ Generate labels for the pod s.t. we can track it in case of Operator
crash
+
+ :param context:
+ :return:
+ """
+ labels = {
+ 'dag_id': context['dag'].dag_id,
+ 'task_id': context['task'].task_id,
+ 'exec_date': context['ts'],
+ 'try_number': context['ti'].try_number,
Review comment:
I should've realised that. We do indeed monitor a previous try number on
purpose in my original changes.
With the try number changes the main duplicate pod issue will be fixed still
as we force kill it before starting a new one. But I took the view that,
although perhaps confusing in terms of try number, it is far more efficient to
continue monitoring the previous try in the case of worker failure. AFAIK
worker failure/restart is the only situation in which pods remain running and
are not killed. If you fail a task via the UI it triggers a pod kill for
example.
Unless you have another solution that can check & re-use the previous try
number then I think we're better off removing the try_number changes?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services