On Mon, 13 Mar 2023 at 08:03, rmluglist2--- via Hampshire < hampshire@mailman.lug.org.uk> wrote:
> Hi all > > > > I have an Ubuntu box which is on 24/7/365. It has ufw running allowing > nothing from outside my lan. > > > > A couple of times recently, I’ve come in to find the machine locked up > with a lot of disk access (it can be ping’d but I can’t ssh into it and it > doesn’t respond to mouse or keyboard on the console – only power cycling > brings it back). As I say, this has now happened twice in the last 3-4 > nights. > > > > I have seen this behaviour sometimes. By default Linux can block all interactive conversations when using high disk access High disk access can be caused by a number of things: 1) some app actually needs the disk 2) Faults on the disk, causing many retries. 3) Swap file access After a reboot, you can look for faults on the disk with "smartctl -a /dev/sda" and see if there are any log messages there about failed sectors, or sector reallocation counts increasing etc. If an app needs the disk, it is probably something kicked off by cron. You can force these apps to use a lower priority for io with "ionice" Google ionice for suitable ways to run it. But, I think a good diagnosis is probably to disable cron altogether for say a week, and see if the problem disappears. Then at least you will then know that cron and the apps it runs are the problem. Another possible cause, is an app causing it to run low on memory that results in unpredictable behaviour when memory allocation fails, and it seems a lot of programs don't behave well when that happens. This might also cause excessive swap file access. These are all problems that are difficult to diagnose while they are happening, so the trick is to set up monitoring to watch for each of the cases. E.g. take metrics of free RAM and when the fault happens, you can look at the metrics graph, to see if that is the problem etc. take metrics of the disk access on a per app basis. Normally the lock up will not be immediate, it will get slow first and then eventually lock up. So at least some metrics are written before the lock up. Kind Regards James
-- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --------------------------------------------------------------