potiuk opened a new issue, #35254:
URL: https://github.com/apache/airflow/issues/35254
### Body
The test
`airfow/tests/jobs/test_triggerrer_job.py::test_trigger_logging_sensitive_info`
has a weird and race'y behaviour - that got exposed when implementing #83221.
As a result it's been quarantined until we diagnose/fix it.
It's very easy to reproduce the racy behaviour, but the root cause is not
yet certain:
1) Enter Breeze (might be any Python version and DB but it has been
confirmed with Python 3.8 and Sqlite, Postgres)
2) Run `pytest
tests/jobs/test_triggerer_job.py::test_trigger_logging_sensitive_info`
3) The test fails because the logs that the test gets are empty
4) Run it again
`tests/jobs/test_triggerer_job.py::test_trigger_logging_sensitive_info`
5) it succeds (and continues doing so until you restart breeze or delete
/root/airflow/.airflow_db_initialised
6) When you delete the `/root/airflow/.airflow_db_initialised` the test
fails again
The presence of /root/airflow/.airflow_db_initialised means that airflow DB
has been "reset" at least once by the tests. We have pytest fixture that checks
if the file is created at least once and in case it has not been created it
will run `initial_db_init` while setting up the tests. In `tests/conftest.py`.
This avoids the problem that if semeone never initialized the DB, they will get
strange DB errors (missing columns/indexes) and it is very confusing for first
time users or when you delete local test DB.
This is done in this code:
```python
@pytest.fixture(autouse=True, scope="session")
def initialize_airflow_tests(request):
...
,,,
elif not os.path.exists(lock_file):
print(
"Initializing the DB - first time after entering the
container.\n"
"You can force re-initialization the database by adding
--with-db-init switch to run-tests."
)
initial_db_init()
# Create pid file
with open(lock_file, "w+"):
pass
else:
```
In some cases / some machines just commenting out `db.resetdb()` that is run
inside `initial_db_init` cause the test to suceed even the first time, but this
behaviour is inconsistent - sometims it does not help, which suggest that this
is some kind of "startup" race of triggerer log handler - where simply adding
more intialization/CPU/disk usage at the startup of tests triggers the handler
to either miss or loose the connection.
Also there is mysterious stacktrace printed (but it is printed in both cases
- when test works and does not work, that sugggests that this is the case and
that it is connected with some race in the in-memory handler for logs, either
wiht not catching or dropping logs bacause of some race at startup. I tried to
debug it but did not have much luck so far - except being able to narrow it
down and produce a very esily reproducible scenario.
```python
tests/jobs/test_triggerer_job.py::test_trigger_logging_sensitive_info
/usr/local/lib/python3.8/site-packages/_pytest/threadexception.py:73:
PytestUnhandledThreadExceptionWarning: Exception in thread Thread-3
Traceback (most recent call last):
File "/usr/local/lib/python3.8/threading.py", line 932, in
_bootstrap_inner
self.run()
File "/usr/local/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.8/logging/handlers.py", line 1487, in
_monitor
self.handle(record)
File "/usr/local/lib/python3.8/logging/handlers.py", line 1468, in handle
handler.handle(record)
File "/opt/airflow/airflow/utils/log/trigger_handler.py", line 104, in
handle
self.emit(record)
File "/opt/airflow/airflow/utils/log/trigger_handler.py", line 93, in
emit
h = self._get_or_create_handler(record.trigger_id,
record.task_instance)
File "/opt/airflow/airflow/utils/log/trigger_handler.py", line 89, in
_get_or_create_handler
self.handlers[trigger_id] = self._make_handler(ti)
File "/opt/airflow/airflow/utils/log/trigger_handler.py", line 84, in
_make_handler
h.set_context(ti=ti)
File "/opt/airflow/airflow/utils/log/file_task_handler.py", line 185, in
set_context
local_loc = self._init_file(ti)
File "/opt/airflow/airflow/utils/log/file_task_handler.py", line 478, in
_init_file
full_path = self.add_triggerer_suffix(full_path=full_path,
job_id=ti.triggerer_job.id)
AttributeError: 'NoneType' object has no attribute 'id'
```
cc: @dstandish @hussein-awala
Also see https://github.com/apache/airflow/pull/35160#discussion_r1375463230
### Committer
- [X] I acknowledge that I am a maintainer/committer of the Apache Airflow
project.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]