wolfier opened a new issue #15600:
URL: https://github.com/apache/airflow/issues/15600


   **Apache Airflow version**: 2.0.0
   
   **Executor**: KubernetesExecutor
   
   **What happened**:
   
   One of my tasks failed to have the task metadata as tags when the exception 
was captured by Sentry.  There were also some instances where the task 
instances did not go through their normal task execution flow. One of the 
situation is receiving an OOMKIll on the kubernetes worker pod. 
   
   **What you expected to happen**:
   
   I expect all exceptions to be captured by Sentry in Airflow and when the 
exception are captured, they will have the corresponding tags representing task 
metadata.
   
   **How to reproduce it**:
   
   I did not receive an exception when a `PythonOperator` runs into an OOM 
issue.
   
   ```python
   from datetime import datetime
   
   from airflow.models import DAG
   from airflow.operators.python_operator import PythonOperator
   
   dag = DAG(
       dag_id="trigger_dag",
       start_date=datetime(2021, 4, 5),
       catchup=True,
       schedule_interval='@once',
   )
   
   def run_this_func(**context):
       import requests, io
   
       url = 'http://212.183.159.230/512MB.zip'
       r = requests.get(url)
   
       with io.BytesIO(r.content) as f:
           f.getvalue()
   
   
   with dag:
       fail = PythonOperator(
           task_id='failing',
           python_callable=run_this_func,
           executor_config={
               "KubernetesExecutor": {
                   "request_cpu": "0.5",
                   "limit_cpu": "0.5",
                   "request_memory": "384Mi",
                   "limit_memory": "384Mi"
               }
           }
       )
   ```
   
   The LocalTaskJob reap is not easy to replicate but there is the logs that 
shows the task as reaped. Sentry was able to capture the exception but it did 
not have the task metadata as tags.
   
   ```
   [2021-04-23 02:26:26,397] {pod_launcher.py:307} ERROR - Event with job id 
airflow-test-pod.a86a144aa1324b81a2daaf8875eaef07 Failed
   [2021-04-23 02:26:27,000] {taskinstance.py:1457} ERROR - Task failed with 
exception
   [2021-04-23 02:26:27,108] {taskinstance.py:1507} INFO - Marking task as 
FAILED. dag_id=dag, task_id=task-two, execution_date=20210405T000000, 
start_date=20210423T022607, end_date=20210423T022627
   [2021-04-23 02:26:28,997] {local_task_job.py:184} WARNING - State of this 
instance has been externally set to failed. Terminating instance.
   [2021-04-23 02:26:29,093] {process_utils.py:100} INFO - Sending 
Signals.SIGTERM to GPID 44
   [2021-04-23 02:26:29,094] {taskinstance.py:1240} ERROR - Received SIGTERM. 
Terminating subprocesses.
   [2021-04-23 02:26:31,257] {process_utils.py:66} INFO - Process 
psutil.Process(pid=44, status='terminated', exitcode=1, started='02:26:07') 
(44) terminated with exit code 1
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to