Yesterday, I reported some log messages in T318479 <https://phabricator.wikimedia.org/T318479>:
logs/django/django.log.2022-09-27:2022-09-28 16:43:39,873 [76e999afc82c10fb99b6c9bf76448d1a] INFO tools_app.middleware: IndexView() logs/django/django.log.2022-09-27:2022-09-28 16:59:18,903 [76e999afc82c10fb99b6c9bf76448d1a] ERROR tools_app.redis: Redis ConnectionError: Error while reading from tools-redis.svc.eqiad.wmflabs:6379 : (110, 'Connection timed out') logs/django/django.log.2022-09-27:2022-09-28 16:59:19,196 [76e999afc82c10fb99b6c9bf76448d1a] INFO tools_app.middleware: request took 0:15:39.323408 When I went to look at this again today, the messages were gone. After a bit of head-scratching, I discovered they were now in a .nfs file: > (venv) spi-tools-dev [django] grep 76e999afc82c10fb99b6c9bf76448d1a > .nfs0000000005f910c800000388 > 2022-09-28 16:43:39,873 [76e999afc82c10fb99b6c9bf76448d1a] INFO > tools_app.middleware: IndexView() > 2022-09-28 16:59:18,903 [76e999afc82c10fb99b6c9bf76448d1a] ERROR > tools_app.redis: Redis ConnectionError: Error while reading from > tools-redis.svc.eqiad.wmflabs:6379 : (110, 'Connection timed out') > 2022-09-28 16:59:19,196 [76e999afc82c10fb99b6c9bf76448d1a] INFO > tools_app.middleware: request took 0:15:39.323408 These log files are created by Python's TimedRotatingFileHandler. So it looks like something was holding the file open at the time it was rotated. In theory, I should be able to find what process has them open using lsof, but that doesn't work when I run it on tools-sgebastion-11: > lsof .nfs0000000005f910c800000388 > lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing > Output information may be incomplete. and if I shell into the krb instance, I just get: > bash: lsof: command not found So how do I figure out what's going on?
_______________________________________________ Cloud mailing list -- [email protected] List information: https://lists.wikimedia.org/postorius/lists/cloud.lists.wikimedia.org/
