We do not know if it's related, but this same OSS is in a very bad
state, with very high load average (200), very high I/O wait time, and
taking many seconds to respond to each read request, making the array
more or less unusable. That's the problem we are trying to fix.

This sounds like a storage system failure. Queuing up of IOs to drive the load to 200 usually means something is broken elsewhere in the stack at a lower level. Not always ... sometimes you have users who like to write several million/billion small ( < 100 byte ) files.

What does dmesg report? Try to do a pastebin/gist of it, and point it to the list.

Things that come to mind are

a) offlined RAID (most likely): This would explain the user load, and all sorts of strange messages about block devices and file systems in the logs

b) A user DoS against the storage: usually someone writing many tiny files.

There are other possibilities, but these seem more likely.

