wolfier opened a new issue #15600:
URL: https://github.com/apache/airflow/issues/15600
**Apache Airflow version**: 2.0.0
**Executor**: KubernetesExecutor
**What happened**:
One of my tasks failed to have the task metadata as tags when the exception
was captured by Sentry. There were also some instances where the task
instances did not go through their normal task execution flow. One of the
situation is receiving an OOMKIll on the kubernetes worker pod.
**What you expected to happen**:
I expect all exceptions to be captured by Sentry in Airflow and when the
exception are captured, they will have the corresponding tags representing task
metadata.
**How to reproduce it**:
I did not receive an exception when a `PythonOperator` runs into an OOM
issue.
```python
from datetime import datetime
from airflow.models import DAG
from airflow.operators.python_operator import PythonOperator
dag = DAG(
dag_id="trigger_dag",
start_date=datetime(2021, 4, 5),
catchup=True,
schedule_interval='@once',
)
def run_this_func(**context):
import requests, io
url = 'http://212.183.159.230/512MB.zip'
r = requests.get(url)
with io.BytesIO(r.content) as f:
f.getvalue()
with dag:
fail = PythonOperator(
task_id='failing',
python_callable=run_this_func,
executor_config={
"KubernetesExecutor": {
"request_cpu": "0.5",
"limit_cpu": "0.5",
"request_memory": "384Mi",
"limit_memory": "384Mi"
}
}
)
```
The LocalTaskJob reap is not easy to replicate but there is the logs that
shows the task as reaped. Sentry was able to capture the exception but it did
not have the task metadata as tags.
```
[2021-04-23 02:26:26,397] {pod_launcher.py:307} ERROR - Event with job id
airflow-test-pod.a86a144aa1324b81a2daaf8875eaef07 Failed
[2021-04-23 02:26:27,000] {taskinstance.py:1457} ERROR - Task failed with
exception
[2021-04-23 02:26:27,108] {taskinstance.py:1507} INFO - Marking task as
FAILED. dag_id=dag, task_id=task-two, execution_date=20210405T000000,
start_date=20210423T022607, end_date=20210423T022627
[2021-04-23 02:26:28,997] {local_task_job.py:184} WARNING - State of this
instance has been externally set to failed. Terminating instance.
[2021-04-23 02:26:29,093] {process_utils.py:100} INFO - Sending
Signals.SIGTERM to GPID 44
[2021-04-23 02:26:29,094] {taskinstance.py:1240} ERROR - Received SIGTERM.
Terminating subprocesses.
[2021-04-23 02:26:31,257] {process_utils.py:66} INFO - Process
psutil.Process(pid=44, status='terminated', exitcode=1, started='02:26:07')
(44) terminated with exit code 1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]