frank-ellis opened a new issue, #28751:
URL: https://github.com/apache/airflow/issues/28751

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   With Airflow 2.3 and 2.4 there appears to be a bug in the KubernetesExecutor 
when used in conjunction with the Google airflow providers.
   
   The bug specifically presents itself when using nearly any Google provider 
operator. During the pod lifecycle, all is well until the executor in the pod 
starts to clean up following a successful run. Airflow itself still see's the 
task marked as a success, but in Kubernetes, while the task is finishing up 
after reporting status, it actually crashes and puts the pod into a Failed 
state silently:
   ```
   Traceback (most recent call last):
     File "/home/airflow/.local/bin/airflow", line 8, in <module>
       sys.exit(main())
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/__main__.py", line 
39, in main
       args.func(args)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/cli_parser.py", 
line 52, in command
       return func(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/cli.py", line 
103, in wrapper
       return f(*args, **kwargs)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
 line 382, in task_run
       _run_task_by_selected_method(args, dag, ti)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
 line 189, in _run_task_by_selected_method
       _run_task_by_local_task_job(args, ti)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
 line 247, in _run_task_by_local_task_job
       run_job.run()
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/base_job.py", 
line 247, in run
       self._execute()
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/local_task_job.py",
 line 137, in _execute
       self.handle_task_exit(return_code)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/local_task_job.py",
 line 168, in handle_task_exit
       self._run_mini_scheduler_on_child_tasks()
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/session.py", 
line 75, in wrapper
       return func(*args, session=session, **kwargs)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/local_task_job.py",
 line 253, in _run_mini_scheduler_on_child_tasks
       partial_dag = task.dag.partial_subset(
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/dag.py", line 
2188, in partial_subset
       dag.task_dict = {
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/dag.py", line 
2189, in <dictcomp>
       t.task_id: _deepcopy_task(t)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/dag.py", line 
2186, in _deepcopy_task
       return copy.deepcopy(t, memo)
     File "/usr/local/lib/python3.9/copy.py", line 153, in deepcopy
       y = copier(memo)
     File 
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/baseoperator.py",
 line 1163, in __deepcopy__
       setattr(result, k, copy.deepcopy(v, memo))
     File "/usr/local/lib/python3.9/copy.py", line 172, in deepcopy
       y = _reconstruct(x, memo, *rv)
     File "/usr/local/lib/python3.9/copy.py", line 264, in _reconstruct
       y = func(*args)
     File "/usr/local/lib/python3.9/enum.py", line 384, in __call__
       return cls.__new__(cls, value)
     File "/usr/local/lib/python3.9/enum.py", line 702, in __new__
       raise ve_exc
   ValueError: <object object at 0x7f570181a3c0> is not a valid _MethodDefault
   ```
   
   Based on a quick look, it appears to be related to the default argument that 
Google is using in its operators which happens to be an Enum, and fails during 
a deepcopy at the end of the task.
   
   Example operator that is affected: 
https://github.com/apache/airflow/blob/403ed7163f3431deb7fc21108e1743385e139907/airflow/providers/google/cloud/hooks/dataproc.py#L753
   Reference to the Google Python API core which has the Enum causing the 
problem: 
https://github.com/googleapis/python-api-core/blob/main/google/api_core/gapic_v1/method.py#L31
   
   ### What you think should happen instead
   
   Kubernetes pods should succeed, be marked as `Completed`, and then be 
gracefully terminated.
   
   ### How to reproduce
   
   Use any `apache-airflow-providers-google` >= 7.0.0 which includes 
`google-api-core` >= 2.2.2. Run a DAG with a task which uses any of the Google 
operators which have `_MethodDefault` as a default argument.
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==6.0.0
   apache-airflow-providers-apache-hive==5.0.0
   apache-airflow-providers-celery==3.0.0
   apache-airflow-providers-cncf-kubernetes==4.4.0
   apache-airflow-providers-common-sql==1.3.1
   apache-airflow-providers-docker==3.2.0
   apache-airflow-providers-elasticsearch==4.2.1
   apache-airflow-providers-ftp==3.1.0
   apache-airflow-providers-google==8.4.0
   apache-airflow-providers-grpc==3.0.0
   apache-airflow-providers-hashicorp==3.1.0
   apache-airflow-providers-http==4.0.0
   apache-airflow-providers-imap==3.0.0
   apache-airflow-providers-microsoft-azure==4.3.0
   apache-airflow-providers-mysql==3.2.1
   apache-airflow-providers-odbc==3.1.2
   apache-airflow-providers-postgres==5.2.2
   apache-airflow-providers-presto==4.2.0
   apache-airflow-providers-redis==3.0.0
   apache-airflow-providers-sendgrid==3.0.0
   apache-airflow-providers-sftp==4.1.0
   apache-airflow-providers-slack==6.0.0
   apache-airflow-providers-sqlite==3.2.1
   apache-airflow-providers-ssh==3.2.0
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to