dstandish opened a new pull request, #38989:
URL: https://github.com/apache/airflow/pull/38989
For security reasons, we don't present the user with tracebacks when there's
a webserver error. If we similarly don't want to provide tracebacks in task
execution logs, we could provide a UUID that an admin can use to find the error
in the server logs.
I'm not 100% sure that we need to hide this from user in this context.
Because the dag writer uses the task API in a way that's different from how
they use the webserver.
But... my guess is the same logic would apply. WDYT?
With this PR, here's what the task logs look like:
```
[2024-04-13, 16:53:53 UTC] {standard_task_runner.py:112} ERROR - Failed to
execute job 29 for task d_1_source (Got 500:INTERNAL SERVER ERROR when sending
the internal api request: Error executing method
'airflow.models.taskinstance.TaskInstance.save_to_db';
error_id=88463b9d-4280-47b4-94a4-94836ce1da2d; 153)
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/task/task_runner/standard_task_runner.py",
line 105, in _start_by_fork
ret = args.func(args, dag=self.dag)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/cli/cli_config.py",
line 49, in command
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/cli.py", line
115, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/cli/commands/task_command.py",
line 476, in task_run
task_return_code = _run_task_by_selected_method(args, _dag, ti)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/cli/commands/task_command.py",
line 253, in _run_task_by_selected_method
return _run_raw_task(args, ti)
^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/cli/commands/task_command.py",
line 335, in _run_raw_task
return ti._run_raw_task(
^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/serialization/pydantic/taskinstance.py",
line 138, in _run_raw_task
_run_raw_task_internal(
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
line 252, in _run_raw_task_internal
TaskInstance.save_to_db(ti=ti, session=session)
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/api_internal/internal_api_call.py",
line 141, in wrapper
result = make_jsonrpc_request(method_name, args_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
289, in wrapped_f
return self(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
379, in __call__
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
314, in iter
return fut.result()
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in
result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in
__get_result
raise self._exception
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
382, in __call__
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/api_internal/internal_api_call.py",
line 118, in make_jsonrpc_request
raise AirflowException(
airflow.exceptions.AirflowException: Got 500:INTERNAL SERVER ERROR when
sending the internal api request: Error executing method
'airflow.models.taskinstance.TaskInstance.save_to_db';
error_id=88463b9d-4280-47b4-94a4-94836ce1da2d
```
Notice that the client side traceback is shown but the server side is not.
And that's already the case. But now I've added a `error_id` UUID that can be
used to trace.
And here's what you see in the server logs
```
[2024-04-13T16:53:53.857+0000] {rpc_api_endpoint.py:153} ERROR - Error
executing method 'airflow.models.taskinstance.TaskInstance.save_to_db';
error_id=88463b9d-4280-47b4-94a4-94836ce1da2d.
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/api_internal/endpoints/rpc_api_endpoint.py",
line 147, in internal_airflow_api
output = handler(**params, session=session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/api_internal/internal_api_call.py",
line 128, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/session.py",
line 81, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
line 3251, in save_to_db
ti = _coalesce_to_orm_ti(ti=ti, session=session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
line 1509, in _coalesce_to_orm_ti
raise NotImplementedError
NotImplementedError
```
Now, as an airflow developer, it would be more convenient if we just
returned the traceback in the 500 response. If there's actually no securtiy
concern here, then that would be the way to go.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]