frank-ellis opened a new issue, #28751:
URL: https://github.com/apache/airflow/issues/28751
### Apache Airflow version
Other Airflow 2 version (please specify below)
### What happened
With Airflow 2.3 and 2.4 there appears to be a bug in the KubernetesExecutor
when used in conjunction with the Google airflow providers.
The bug specifically presents itself when using nearly any Google provider
operator. During the pod lifecycle, all is well until the executor in the pod
starts to clean up following a successful run. Airflow itself still see's the
task marked as a success, but in Kubernetes, while the task is finishing up
after reporting status, it actually crashes and puts the pod into a Failed
state silently:
```
Traceback (most recent call last):
File "/home/airflow/.local/bin/airflow", line 8, in <module>
sys.exit(main())
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/__main__.py", line
39, in main
args.func(args)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/cli_parser.py",
line 52, in command
return func(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/cli.py", line
103, in wrapper
return f(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
line 382, in task_run
_run_task_by_selected_method(args, dag, ti)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
line 189, in _run_task_by_selected_method
_run_task_by_local_task_job(args, ti)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
line 247, in _run_task_by_local_task_job
run_job.run()
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/base_job.py",
line 247, in run
self._execute()
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/local_task_job.py",
line 137, in _execute
self.handle_task_exit(return_code)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/local_task_job.py",
line 168, in handle_task_exit
self._run_mini_scheduler_on_child_tasks()
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/session.py",
line 75, in wrapper
return func(*args, session=session, **kwargs)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/jobs/local_task_job.py",
line 253, in _run_mini_scheduler_on_child_tasks
partial_dag = task.dag.partial_subset(
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/dag.py", line
2188, in partial_subset
dag.task_dict = {
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/dag.py", line
2189, in <dictcomp>
t.task_id: _deepcopy_task(t)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/dag.py", line
2186, in _deepcopy_task
return copy.deepcopy(t, memo)
File "/usr/local/lib/python3.9/copy.py", line 153, in deepcopy
y = copier(memo)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/models/baseoperator.py",
line 1163, in __deepcopy__
setattr(result, k, copy.deepcopy(v, memo))
File "/usr/local/lib/python3.9/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/local/lib/python3.9/copy.py", line 264, in _reconstruct
y = func(*args)
File "/usr/local/lib/python3.9/enum.py", line 384, in __call__
return cls.__new__(cls, value)
File "/usr/local/lib/python3.9/enum.py", line 702, in __new__
raise ve_exc
ValueError: <object object at 0x7f570181a3c0> is not a valid _MethodDefault
```
Based on a quick look, it appears to be related to the default argument that
Google is using in its operators which happens to be an Enum, and fails during
a deepcopy at the end of the task.
Example operator that is affected:
https://github.com/apache/airflow/blob/403ed7163f3431deb7fc21108e1743385e139907/airflow/providers/google/cloud/hooks/dataproc.py#L753
Reference to the Google Python API core which has the Enum causing the
problem:
https://github.com/googleapis/python-api-core/blob/main/google/api_core/gapic_v1/method.py#L31
### What you think should happen instead
Kubernetes pods should succeed, be marked as `Completed`, and then be
gracefully terminated.
### How to reproduce
Use any `apache-airflow-providers-google` >= 7.0.0 which includes
`google-api-core` >= 2.2.2. Run a DAG with a task which uses any of the Google
operators which have `_MethodDefault` as a default argument.
### Operating System
Debian GNU/Linux 11 (bullseye)
### Versions of Apache Airflow Providers
apache-airflow-providers-amazon==6.0.0
apache-airflow-providers-apache-hive==5.0.0
apache-airflow-providers-celery==3.0.0
apache-airflow-providers-cncf-kubernetes==4.4.0
apache-airflow-providers-common-sql==1.3.1
apache-airflow-providers-docker==3.2.0
apache-airflow-providers-elasticsearch==4.2.1
apache-airflow-providers-ftp==3.1.0
apache-airflow-providers-google==8.4.0
apache-airflow-providers-grpc==3.0.0
apache-airflow-providers-hashicorp==3.1.0
apache-airflow-providers-http==4.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-microsoft-azure==4.3.0
apache-airflow-providers-mysql==3.2.1
apache-airflow-providers-odbc==3.1.2
apache-airflow-providers-postgres==5.2.2
apache-airflow-providers-presto==4.2.0
apache-airflow-providers-redis==3.0.0
apache-airflow-providers-sendgrid==3.0.0
apache-airflow-providers-sftp==4.1.0
apache-airflow-providers-slack==6.0.0
apache-airflow-providers-sqlite==3.2.1
apache-airflow-providers-ssh==3.2.0
### Deployment
Other 3rd-party Helm chart
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]