mhaalme opened a new issue #14950:
URL: https://github.com/apache/airflow/issues/14950


   **Apache Airflow version**: 2.0.1
   
   **Environment**:
   
   - **OS**: Oracle Linux Server 7.9
   
   **What happened**:
   
   One of our new BashOperator tasks failed and no useful information could be 
found about the cause of failure in the task logs. This seemed odd, so I 
temporarily turned off JSON formatting, and the non-JSON logs contained a lot 
more information. With JSON logging turned on, only the first line was logged 
and the entire traceback was discarded.
   
   I've replaced the values of dag_id, task_id and the bash_command path in the 
following logs with similar placeholders. Their actual values are not relevant 
in reproducing the error.
   
   ERROR level logs if json_format=True:
   ```
   {"asctime": "2021-03-19 15:33:05,620", "filename": "taskinstance.py", 
"lineno": 1455, "levelname": "ERROR", "message": "sh /path/to/my/script.sh", 
"created": 1616160785.620597, "dag_id": "my_dag", "task_id": "my_task", 
"execution_date": "2021_03_19T13_33_04_356460", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-19T13:33:04.356460+00:00-1", "offset": 
1616160785567723008}
   ```
   
   ERROR level logs if json_format=False:
   ```
   [2021-03-19 15:37:19,468] {{taskinstance.py:1455}} ERROR - sh 
/path/to/my/script.sh
   Traceback (most recent call last):
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/airflow/models/taskinstance.py",
 line 1112, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/airflow/models/taskinstance.py",
 line 1249, in _prepare_and_execute_task_with_callbacks
       self.render_templates(context=context)
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/airflow/models/taskinstance.py",
 line 1750, in render_templates
       self.task.render_template_fields(context)
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/airflow/models/baseoperator.py",
 line 859, in render_template_fields
       self._do_render_template_fields(self, self.template_fields, context, 
jinja_env, set())
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/airflow/models/baseoperator.py",
 line 872, in _do_render_template_fields
       rendered_content = self.render_template(content, context, jinja_env, 
seen_oids)
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/airflow/models/baseoperator.py",
 line 907, in render_template
       return jinja_env.get_template(content).render(**context)
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/jinja2/environment.py", 
line 883, in get_template
       return self._load_template(name, self.make_globals(globals))
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/jinja2/environment.py", 
line 857, in _load_template
       template = self.loader.load(self, name, globals)
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/jinja2/loaders.py", line 
115, in load
       source, filename, uptodate = self.get_source(environment, name)
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/jinja2/loaders.py", line 
197, in get_source
       raise TemplateNotFound(template)
   jinja2.exceptions.TemplateNotFound: sh /path/to/my/script.sh
   ```
   
   
   
   The actual error itself is irrelevant. I understand that it happened because 
the command was interpreted as a template and I've fixed the command. The 
problem is that the error cause was not logged properly with json logging 
turned on, which made it a lot more difficult to identify the cause of the 
error.
   
   **What you expected to happen**:
   
   Log messages should contain the same information regardless of log format. 
The entire traceback should be included in json logs, preferably within a 
single json record with escaped linebreaks.
   
   **How to reproduce it**:
   1. Set json_format=True in airflow.cfg
   2. Create a dag with one BashOperator task. Set bash_command to something 
that cannot be successfully run, e.g the wrong path to a template.
   3. Run the task and check task logs
   4. Repeat with json_format=False and compare
   
   **Anything else we need to know**:
   I don't know if this problem is specific to BashOperator or happens with 
other operators as well.
   
   The same behaviour can be observed if another kind of invalid command is 
supplied, e.g a bash_command to execute a file that does not exist. Please see 
the ERROR level log line in the following examples.
   
   json_format=False
   ```
   [2021-03-23 09:57:29,977] {{taskinstance.py:851}} INFO - Dependencies all 
met for <TaskInstance: my_dag.my_task 2021-03-23T07:56:28.535261+00:00 [queued]>
   [2021-03-23 09:57:29,997] {{taskinstance.py:851}} INFO - Dependencies all 
met for <TaskInstance: my_dag.my_task 2021-03-23T07:56:28.535261+00:00 [queued]>
   [2021-03-23 09:57:29,997] {{taskinstance.py:1042}} INFO -
   
--------------------------------------------------------------------------------
   [2021-03-23 09:57:29,997] {{taskinstance.py:1043}} INFO - Starting attempt 1 
of 1
   [2021-03-23 09:57:29,997] {{taskinstance.py:1044}} INFO -
   
--------------------------------------------------------------------------------
   [2021-03-23 09:57:30,012] {{taskinstance.py:1063}} INFO - Executing 
<Task(BashOperator): my_task> on 2021-03-23T07:56:28.535261+00:00
   [2021-03-23 09:57:30,015] {{standard_task_runner.py:52}} INFO - Started 
process 3461 to run task
   [2021-03-23 09:57:30,021] {{standard_task_runner.py:76}} INFO - Running: 
['airflow', 'tasks', 'run', 'my_dag', 'my_task', 
'2021-03-23T07:56:28.535261+00:00', '--job-id', '169438', '--pool', 
'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/dags/my_dag.py', 
'--cfg-path', '/tmp/tmpcgczuh72', '--error-file', '/tmp/tmpj7gc2ajl']
   [2021-03-23 09:57:30,023] {{standard_task_runner.py:77}} INFO - Job 169438: 
Subtask my_task
   [2021-03-23 09:57:30,092] {{logging_mixin.py:104}} INFO - Running 
<TaskInstance: my_dag.my_task 2021-03-23T07:56:28.535261+00:00 [running]> on 
host airflow.host
   [2021-03-23 09:57:30,183] {{taskinstance.py:1257}} INFO - Exporting the 
following env vars:
   AIRFLOW_CTX_DAG_OWNER=Teamplatform
   AIRFLOW_CTX_DAG_ID=my_dag
   AIRFLOW_CTX_TASK_ID=my_task
   AIRFLOW_CTX_EXECUTION_DATE=2021-03-23T07:56:28.535261+00:00
   AIRFLOW_CTX_DAG_RUN_ID=scheduled__2021-03-23T07:56:28.535261+00:00
   [2021-03-23 09:57:30,185] {{bash.py:135}} INFO - Tmp dir root location:
    /tmp
   [2021-03-23 09:57:30,185] {{bash.py:158}} INFO - Running command: 
/wrong/path/to/zabbix_sender -z zabbix.host -s airflow.host -k 
airflow.scheduler.delay -o 61
   [2021-03-23 09:57:30,192] {{bash.py:169}} INFO - Output:
   [2021-03-23 09:57:30,193] {{bash.py:173}} INFO - bash: 
/wrong/path/to/zabbix_sender: No such file or directory
   [2021-03-23 09:57:30,194] {{bash.py:177}} INFO - Command exited with return 
code 127
   [2021-03-23 09:57:30,209] {{taskinstance.py:1455}} ERROR - Bash command 
failed. The command returned a non-zero exit code.
   Traceback (most recent call last):
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/airflow/models/taskinstance.py",
 line 1112, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/airflow/models/taskinstance.py",
 line 1285, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/airflow/models/taskinstance.py",
 line 1315, in _execute_task
       result = task_copy.execute(context=context)
     File 
"/app01/venv/airflowvenv/lib64/python3.6/site-packages/airflow/operators/bash.py",
 line 180, in execute
       raise AirflowException('Bash command failed. The command returned a 
non-zero exit code.')
   airflow.exceptions.AirflowException: Bash command failed. The command 
returned a non-zero exit code.
   [2021-03-23 09:57:30,212] {{taskinstance.py:1503}} INFO - Marking task as 
FAILED. dag_id=my_dag, task_id=my_task, execution_date=20210323T075628, 
start_date=20210323T075729, end_date=20210323T075730
   [2021-03-23 09:57:30,275] {{local_task_job.py:146}} INFO - Task exited with 
return code 1
   end_of_log
   ```
   
   json_format=True
   ```
   {"asctime": "2021-03-23 09:54:29,572", "filename": "taskinstance.py", 
"lineno": 851, "levelname": "INFO", "message": "Dependencies all met for 
<TaskInstance: my_dag.my_task 2021-03-23T07:53:28.535261+00:00 [queued]>", 
"created": 1616486069.5725589, "dag_id": "my_dag", "task_id": "my_task", 
"execution_date": "2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069531250688}
   {"asctime": "2021-03-23 09:54:29,597", "filename": "taskinstance.py", 
"lineno": 851, "levelname": "INFO", "message": "Dependencies all met for 
<TaskInstance: my_dag.my_task 2021-03-23T07:53:28.535261+00:00 [queued]>", 
"created": 1616486069.5971286, "dag_id": "my_dag", "task_id": "my_task", 
"execution_date": "2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069531250688}
   {"asctime": "2021-03-23 09:54:29,597", "filename": "taskinstance.py", 
"lineno": 1042, "levelname": "INFO", "message": 
"\n--------------------------------------------------------------------------------",
 "created": 1616486069.5976064, "dag_id": "my_dag", "task_id": "my_task", 
"execution_date": "2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069531250688}
   {"asctime": "2021-03-23 09:54:29,597", "filename": "taskinstance.py", 
"lineno": 1043, "levelname": "INFO", "message": "Starting attempt 1 of 1", 
"created": 1616486069.5978124, "dag_id": "my_dag", "task_id": "my_task", 
"execution_date": "2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069531250688}
   {"asctime": "2021-03-23 09:54:29,597", "filename": "taskinstance.py", 
"lineno": 1044, "levelname": "INFO", "message": 
"\n--------------------------------------------------------------------------------",
 "created": 1616486069.5979714, "dag_id": "my_dag", "task_id": "my_task", 
"execution_date": "2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069531250688}
   {"asctime": "2021-03-23 09:54:29,611", "filename": "taskinstance.py", 
"lineno": 1063, "levelname": "INFO", "message": "Executing <Task(BashOperator): 
my_task> on 2021-03-23T07:53:28.535261+00:00", "created": 1616486069.611475, 
"dag_id": "my_dag", "task_id": "my_task", "execution_date": 
"2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069531250688}
   {"asctime": "2021-03-23 09:54:29,616", "filename": 
"standard_task_runner.py", "lineno": 52, "levelname": "INFO", "message": 
"Started process 2610 to run task", "created": 1616486069.6161618, "dag_id": 
"my_dag", "task_id": "my_task", "execution_date": "2021_03_23T07_53_28_535261", 
"try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069531250688}
   {"asctime": "2021-03-23 09:54:29,622", "filename": 
"standard_task_runner.py", "lineno": 76, "levelname": "INFO", "message": 
"Running: ['airflow', 'tasks', 'run', 'my_dag', 'my_task', 
'2021-03-23T07:53:28.535261+00:00', '--job-id', '169433', '--pool', 
'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/dags/my_dag.py', 
'--cfg-path', '/tmp/tmp09133rp4', '--error-file', '/tmp/tmpyr5c5hkv']", 
"created": 1616486069.622188, "dag_id": "my_dag", "task_id": "my_task", 
"execution_date": "2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069531250688}
   {"asctime": "2021-03-23 09:54:29,623", "filename": 
"standard_task_runner.py", "lineno": 77, "levelname": "INFO", "message": "Job 
169433: Subtask my_task", "created": 1616486069.6236992, "dag_id": "my_dag", 
"task_id": "my_task", "execution_date": "2021_03_23T07_53_28_535261", 
"try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069531250688}
   {"asctime": "2021-03-23 09:54:29,769", "filename": "logging_mixin.py", 
"lineno": 104, "levelname": "INFO", "message": "Running <TaskInstance: 
my_dag.my_task 2021-03-23T07:53:28.535261+00:00 [running]> on host 
airflow.host", "created": 1616486069.7698724, "dag_id": "my_dag", "task_id": 
"my_task", "execution_date": "2021_03_23T07_53_28_535261", "try_number": "1", 
"log_id": "my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069723135744}
   {"asctime": "2021-03-23 09:54:29,869", "filename": "taskinstance.py", 
"lineno": 1257, "levelname": "INFO", "message": "Exporting the following env 
vars:\nAIRFLOW_CTX_DAG_OWNER=Teamplatform\nAIRFLOW_CTX_DAG_ID=my_dag\nAIRFLOW_CTX_TASK_ID=my_task\nAIRFLOW_CTX_EXECUTION_DATE=2021-03-23T07:53:28.535261+00:00\nAIRFLOW_CTX_DAG_RUN_ID=scheduled__2021-03-23T07:53:28.535261+00:00",
 "created": 1616486069.8690863, "dag_id": "my_dag", "task_id": "my_task", 
"execution_date": "2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069723135744}
   {"asctime": "2021-03-23 09:54:29,870", "filename": "bash.py", "lineno": 135, 
"levelname": "INFO", "message": "Tmp dir root location: \n /tmp", "created": 
1616486069.8703568, "dag_id": "my_dag", "task_id": "my_task", "execution_date": 
"2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069723135744}
   {"asctime": "2021-03-23 09:54:29,871", "filename": "bash.py", "lineno": 158, 
"levelname": "INFO", "message": "Running command: /wrong/path/to/zabbix_sender 
-z zabbix.host -s airflow.host -k airflow.scheduler.delay -o 61", "created": 
1616486069.871164, "dag_id": "my_dag", "task_id": "my_task", "execution_date": 
"2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069723135744}
   {"asctime": "2021-03-23 09:54:29,880", "filename": "bash.py", "lineno": 169, 
"levelname": "INFO", "message": "Output:", "created": 1616486069.8801942, 
"dag_id": "my_dag", "task_id": "my_task", "execution_date": 
"2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069723135744}
   {"asctime": "2021-03-23 09:54:29,882", "filename": "bash.py", "lineno": 173, 
"levelname": "INFO", "message": "bash: /wrong/path/to/zabbix_sender: No such 
file or directory", "created": 1616486069.8820558, "dag_id": "my_dag", 
"task_id": "my_task", "execution_date": "2021_03_23T07_53_28_535261", 
"try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069723135744}
   {"asctime": "2021-03-23 09:54:29,882", "filename": "bash.py", "lineno": 177, 
"levelname": "INFO", "message": "Command exited with return code 127", 
"created": 1616486069.882363, "dag_id": "my_dag", "task_id": "my_task", 
"execution_date": "2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069723135744}
   {"asctime": "2021-03-23 09:54:29,899", "filename": "taskinstance.py", 
"lineno": 1455, "levelname": "ERROR", "message": "Bash command failed. The 
command returned a non-zero exit code.", "created": 1616486069.8990078, 
"dag_id": "my_dag", "task_id": "my_task", "execution_date": 
"2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069723135744}
   {"asctime": "2021-03-23 09:54:29,901", "filename": "taskinstance.py", 
"lineno": 1503, "levelname": "INFO", "message": "Marking task as FAILED. 
dag_id=my_dag, task_id=my_task, execution_date=20210323T075328, 
start_date=20210323T075429, end_date=20210323T075429", "created": 
1616486069.9011128, "dag_id": "my_dag", "task_id": "my_task", "execution_date": 
"2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069723135744}
   {"asctime": "2021-03-23 09:54:29,961", "filename": "local_task_job.py", 
"lineno": 146, "levelname": "INFO", "message": "Task exited with return code 
1", "created": 1616486069.9613628, "dag_id": "my_dag", "task_id": "my_task", 
"execution_date": "2021_03_23T07_53_28_535261", "try_number": "1", "log_id": 
"my_dag-my_task-2021-03-23T07:53:28.535261+00:00-1", "offset": 
1616486069531250688}
   end_of_log
   ```
   
   In that case, the bash command output is logged anyway in INFO level and so 
it's not that big of an issue compared to the case I've described above. Either 
way it would be good to have this information in json logs as well.
   
   Our logging configuration:
   
   ```
   [logging]
   base_log_folder = <path to log folder>
   remote_logging = True
   remote_log_conn_id =
   google_key_path =
   remote_base_log_folder =
   encrypt_s3_logs = False
   logging_level = INFO
   fab_logging_level = WARN
   logging_config_class =
   colored_console_log = True
   colored_log_format = [%(blue)s%(asctime)s%(reset)s] 
{{%(blue)s%(filename)s:%(reset)s%(lineno)d}} 
%(log_color)s%(levelname)s%(reset)s - %(log_color)s%(message)s%(reset)s
   colored_formatter_class = 
airflow.utils.log.colored_log.CustomTTYColoredFormatter
   log_format = [%(asctime)s] {{%(filename)s:%(lineno)d}} %(levelname)s - 
%(message)s
   simple_log_format = %(asctime)s %(levelname)s - %(message)s
   task_log_prefix_template =
   log_filename_template = run-{{ ti.dag_id }}.log
   log_processor_filename_template = {{ filename }}.log
   dag_processor_manager_log_location = <path to log file>
   task_log_reader = task
   extra_loggers =
   
   [elasticsearch]
   host = <elasticsearch connection string>
   log_id_template = 
{{ti.dag_id}}-{{ti.task_id}}-{{execution_date}}-{{try_number}}
   end_of_log_mark = end_of_log
   frontend =
   write_stdout = False
   json_format = True
   json_fields = asctime, filename, lineno, levelname, message, created
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to