danccooper commented on a change in pull request #6377: [AIRFLOW-5589] monitor 
pods by labels instead of names
URL: https://github.com/apache/airflow/pull/6377#discussion_r338494904
 
 

 ##########
 File path: airflow/contrib/operators/kubernetes_pod_operator.py
 ##########
 @@ -112,55 +113,60 @@ class KubernetesPodOperator(BaseOperator):  # pylint: 
disable=too-many-instance-
     """
     template_fields = ('cmds', 'arguments', 'env_vars', 'config_file')
 
+    @staticmethod
+    def create_labels_for_pod(context):
+        """
+        Generate labels for the pod s.t. we can track it in case of Operator 
crash
+
+        :param context:
+        :return:
+        """
+        labels = {
+            'dag_id': context['dag'].dag_id,
+            'task_id': context['task'].task_id,
+            'exec_date': context['ts'],
+            'try_number': context['ti'].try_number,
 
 Review comment:
   I should've realised that.  We do indeed monitor a previous try number on 
purpose in my original changes.
   
   With the try number changes the main duplicate pod issue will be fixed still 
as we force kill it before starting a new one.  But I took the view that, 
although perhaps confusing in terms of try number, it is far more efficient to 
continue monitoring the previous try in the case of worker failure.  AFAIK 
worker failure/restart is the only situation in which pods remain running and 
are not killed.  If you fail a task via the UI it triggers a pod kill for 
example.
   
   Unless you have another solution that can check & re-use the previous try 
number then I think we're better off removing the try_number changes?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to