hterik opened a new issue #19983:
URL: https://github.com/apache/airflow/issues/19983
### Description
Given a simple dag like this:
```py
from airflow import DAG
from airflow.utils.dates import days_ago
from airflow.operators.bash import BashOperator
with DAG("instantfail",
description="instantfail",
tags=["example"],
catchup=False,
schedule_interval=None,
start_date=days_ago(2),
) as dag:
BashOperator(bash_command="echo hello && false", task_id="b")
```
The error log from the failing bash command, `false`, becomes incredibly big
with lots of irrelevant information for the writer of the DAG, if that's not
enough, the error is printed twice:
```
[2021-12-02, 17:07:01 UTC] {subprocess.py:85} INFO - Output:
[2021-12-02, 17:07:01 UTC] {subprocess.py:89} INFO - hello
[2021-12-02, 17:07:01 UTC] {subprocess.py:93} INFO - Command exited with
return code 1
[2021-12-02, 17:07:01 UTC] {taskinstance.py:1703} ERROR - Task failed with
exception
Traceback (most recent call last):
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py",
line 1332, in _run_raw_task
self._execute_task_with_callbacks(context)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py",
line 1458, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py",
line 1514, in _execute_task
result = execute_callable(context=context)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/operators/bash.py",
line 187, in execute
raise AirflowException(
airflow.exceptions.AirflowException: Bash command failed. The command
returned a non-zero exit code 1.
[2021-12-02, 17:07:01 UTC] {taskinstance.py:1270} INFO - Marking task as
FAILED. dag_id=instantfail, task_id=b, execution_date=20211202T160659,
start_date=20211202T160701, end_date=20211202T160701
[2021-12-02, 17:07:01 UTC] {standard_task_runner.py:88} ERROR - Failed to
execute job 292 for task b
Traceback (most recent call last):
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/task/task_runner/standard_task_runner.py",
line 85, in _start_by_fork
args.func(args, dag=self.dag)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/cli/cli_parser.py",
line 48, in command
return func(*args, **kwargs)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/utils/cli.py",
line 92, in wrapper
return f(*args, **kwargs)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
line 292, in task_run
_run_task_by_selected_method(args, dag, ti)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
line 107, in _run_task_by_selected_method
_run_raw_task(args, ti)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/cli/commands/task_command.py",
line 180, in _run_raw_task
ti._run_raw_task(
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/utils/session.py",
line 70, in wrapper
return func(*args, session=session, **kwargs)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py",
line 1332, in _run_raw_task
self._execute_task_with_callbacks(context)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py",
line 1458, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/models/taskinstance.py",
line 1514, in _execute_task
result = execute_callable(context=context)
File
"/homedir/.local/share/virtualenvs/airflow-uWHa5xnK/lib/python3.9/site-packages/airflow/operators/bash.py",
line 187, in execute
raise AirflowException(
airflow.exceptions.AirflowException: Bash command failed. The command
returned a non-zero exit code 1.
[2021-12-02, 17:07:01 UTC] {local_task_job.py:154} INFO - Task exited with
return code 1
[2021-12-02, 17:07:01 UTC] {local_task_job.py:264} INFO - 0 downstream tasks
scheduled from follow-on schedule check
```
Often operators can be be more advanced, especially PythonOperator and
taskflow which frequently creates equally deep stack traces themselves, which
is usually what you are interested in, but together it accumulates into several
pages of error.
I'm usually the guy who thinks a stack trace is the best error message and
like the context it provides, but here i think it goes a bit too far.
------------------------------------------------------------
1. Why is the error printed twice?
2. The writer of a dag should be able to assume the execution path up to
execute_callable() was sane. If i'm debugging a DAG or an Operator i don't want
to debug Airflow task runner internals. I don't know how true this is but this
should be the goal at least.
Expected log output should only contain
```
File "airflow/operators/bash.py", line 187, in execute
raise AirflowException(
airflow.exceptions.AirflowException: Bash command failed. The command
returned a non-zero exit code 1.
```
Can `TaskInstance._execute_task` be made to catch the operators exception
and rethrow as something else, that can be ignored at the end by
`standard_task_runner`?
Or is this something that simply should be solved by logging configuration?
### Use case/motivation
_No response_
### Related issues
_No response_
### Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]