mtilda opened a new pull request, #25882:
URL: https://github.com/apache/airflow/pull/25882

   ## Changes
   
   This PR adds a condition to the method `find_pod` to verify that the name of 
the pod matches the name defined by the user. This will ensure two pods with 
different names but identical context do not collide.
   
   ## Issue
   
   Two (or more) pods created with `KubernetesPodOperator` from identical 
context (`namespace`, `dag_id`, `task_id`, `run_id`, and `map_index`) across 
different Airflow environments (e.g. staging + production) collide causing race 
conditions. This happens because the method `find_pods` in 
`KubernetesPodOperator` sees these pods as identical.
   
   At first, we thought this was a name collision, so we prepended the string 
`stg-`, or `prd-` (based on an Airflow Variable) to the pod name. It seems this 
did not fix our problem; however it made the problem easier to debug.
   
   As you can see in the logs below, instead of creating the pod 
`prd-example-9ba5a14b73ab41988c1c57afe5ec81f4`, KPO found the "matching" pod 
`stg-example-ce40ace00c704b87a56241b87bc4ff47`.
   
   ```
   ...
   [2022-08-10, 07:56:49 EDT] {kubernetes_pod.py:221} INFO - Creating pod 
prd-example-9ba5a14b73ab41988c1c57afe5ec81f4 with labels: {'dag_id': 
'example_k8s_pod_operator', 'task_id': 'run_k8s_pod', 'run_id': 
'scheduled__2022-08-09T1145000000-c7f22427c', 'kubernetes_pod_operator': 
'True', 'try_number': '4'}
   [2022-08-10, 07:56:49 EDT] {kubernetes_pod.py:366} INFO - Found matching pod 
stg-example-9ba5a14b73ab41988c1c57afe5ec81f4 with labels {'airflow_version': 
'2.3.2-astro.2', 'dag_id': 'example_k8s_pod_operator', 
'kubernetes_pod_operator': 'True', 'run_id': 
'scheduled__2022-08-09T1145000000-c7f22427c', 'task_id': 'run_k8s_pod', 
'try_number': '4'}
   [2022-08-10, 07:56:49 EDT] {kubernetes_pod.py:367} INFO - `try_number` of 
task_instance: 4
   [2022-08-10, 07:56:49 EDT] {kubernetes_pod.py:368} INFO - `try_number` of 
pod: 4
   ...
   ```
   
   We would like to see the `find_pod` method fail to find a pod in this 
situation, because the name does not match what the user defined.
   
   ## Test
   
   Run two Airflow projects with identical DAGs (same `dag_id` and schedule 
args) with an identical task (same `task_id`) like below:
   
   ```py
   with DAG(
       "example_k8s_pod_operator"
       description="Runs scripts in Kubernetes Pods",
       default_args={
           "retries": 1,
           "retry_delay": timedelta(seconds=10),
           "trigger_rule": "all_done",
       },
       start_date=datetime(2022, 1, 1),
       schedule_interval=None,
   ) as dag:
   
       KubernetesPodOperator(
           task_id="run_k8s_pod",
           arguments=["Hello world!"],
           cmds=["echo"],
           get_logs=True,
           image="ubuntu",
           # in_cluster=False,  # We use a different cluster, but you should 
not need to
           is_delete_operator_pod=True,
           kubernetes_conn_id="GKE_CONFIG",
           namespace="default",
           name="example",  # Change this manually or with Airflow Variable
       )
   ```
   
   Change the argument `name`, so it is unique across your Airflow instances.
   
   Trigger these DAGs at approximately the same time. Then observe logs for 
race condition behavior described above.
   
   ---
   
   ## Astronomer Support
   
   This relates to [Astronomer Support ticket 
#11647](https://support.astronomer.io/hc/en-us/requests/11647).
   
   ## Notes
   
   I am a new contributor, so feel free to educate me on contribution best 
practices.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to