ft2898 opened a new issue, #51640:
URL: https://github.com/apache/airflow/issues/51640

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.10.5
   
   ### What happened?
   
   I'm encountering an issue with Apache Airflow 2.10.5 where the scheduler 
crashes with a TypeError. This issue happens when it detects a zombie task and 
tries to process it. The scheduler logs and stack trace indicate that the error 
occurred due to a NoneType value for end_date in the next_retry_datetime method 
of TaskInstance.
   
   
   
   Here are the relevant logs and traceback:
   
   > [2025-06-12T01:43:19.317+0800] {scheduler_job_runner.py:2110} ERROR - 
Detected zombie job:
   {'full_filepath': '/data/airflow/dags/analysis_hourly.py', 
   'processor_subdir': '/data/airflow/dags', 
   'msg': "{'DAG Id': 'analysis_hourly', 'Task Id': 
'analysis_hourly.tableau-235976', 'Run Id': 
'scheduled__2025-06-11T16:00:00+00:00', 'Hostname': 
'centos-hadoop3dn-480896.intsig.internal', 'External Executor Id': 
'52b249a2-d690-4713-a3b3-1b2e8d72305b'}", 
   'simple_task_instance': SimpleTaskInstance(dag_id='analysis_hourly', 
task_id='analysis_hourly.tableau-235976', 
run_id='scheduled__2025-06-11T16:00:00+00:00', map_index=-1, 
start_date=datetime.datetime(2025, 6, 11, 17, 34, 33, 41895, 
tzinfo=Timezone('UTC')), end_date=None, try_number=1, state='running', 
executor=None, executor_config={}, run_as_user=None, pool='default_pool', 
priority_weight=2, queue='worker_03', 
key=TaskInstanceKey(dag_id='analysis_hourly', 
task_id='analysis_hourly.tableau-235976', 
run_id='scheduled__2025-06-11T16:00:00+00:00', try_number=1, map_index=-1)), 
   'task_callback_type': None}
   
   Full traceback:
   
   > File 
"/data/miniconda/envs/py311/lib/python3.11/site-packages/airflow/models/taskinstance.py",
 line 2601, in are_dependencies_met
       for dep_status in self.get_failed_dep_statuses(dep_context=dep_context, 
session=session):
     File 
"/data/miniconda/envs/py311/lib/python3.11/site-packages/airflow/models/taskinstance.py",
 line 2625, in get_failed_dep_statuses
       for dep_status in dep.get_dep_statuses(self, session, dep_context):
     File 
"/data/miniconda/envs/py311/lib/python3.11/site-packages/airflow/ti_deps/deps/base_ti_dep.py",
 line 115, in get_dep_statuses
       yield from self._get_dep_statuses(ti, session, cxt)
     File 
"/data/miniconda/envs/py311/lib/python3.11/site-packages/airflow/ti_deps/deps/not_in_retry_period_dep.py",
 line 48, in _get_dep_statuses
       next_task_retry_date = ti.next_retry_datetime()
                              ^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/data/miniconda/envs/py311/lib/python3.11/site-packages/airflow/models/taskinstance.py",
 line 2685, in next_retry_datetime
       return self.end_date + delay
              ~~~~~~~~~~~~~~^~~~~~~
   TypeError: unsupported operand type(s) for +: 'NoneType' and 
'datetime.timedelta'
   
   From the logs, it seems that end_date in the TaskInstance object is None, 
causing the crash when a zombie task is being processed.
   
   Steps to Reproduce:
   
   1. Schedule a DAG with tasks that may encounter retries or fail conditions.
   2. Observe logs where the scheduler detects zombie tasks (ERROR - Detected 
zombie job).
   3. Scheduler crashes with the above traceback.
   
   Expected Behavior: The scheduler should handle zombie tasks gracefully 
without crashing.
   
   Actual Behavior: The scheduler crashes due to a TypeError in 
TaskInstance.next_retry_datetime when end_date is None.
   
   Environment:
   
   1. Airflow version: 2.10.5
   2. Python version: 3.11
   3. Database backend: MySQL 8.0
   4. Executor: Celery-based worker
   5. OS: CentOS7.9
   6. DAG configuration: Includes retries and uses default_pool.
   
   Additional Context:
   
   - I suspect that the issue happens when TaskInstance.end_date is None during 
processing of zombie tasks.
   - This only seems to occur under specific conditions, such as failed or 
zombie tasks.
   - Link to relevant documentation mentioning zombie tasks 
[here](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html#zombie-undead-tasks).
   - This issue did not happen in earlier versions (e.g., 1.10.x).
   
   Let me know if additional logs, configurations, or DAG definitions are 
needed to investigate further.
   
   ### What you think should happen instead?
   
   _No response_
   
   ### How to reproduce
   
   Unfortunately, I have not been able to clearly identify the exact steps to 
reproduce the issue. The problem appears sporadically in my environment under 
the following conditions:
   
   A DAG is scheduled with tasks that include retries in their configuration.
   A task runs and encounters some form of failure, potentially causing zombie 
tasks to appear.
   The scheduler detects zombie tasks (ERROR - Detected zombie job) and 
subsequently crashes due to a TypeError.
   
   ### Operating System
   
   NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" 
VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" 
CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/"; 
BUG_REPORT_URL="https://bugs.centos.org/";  CENTOS_MANTISBT_PROJECT="CentOS-7" 
CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" 
REDHAT_SUPPORT_PRODUCT_VERSION="7"
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to