Re: [lustre-discuss] "Not on preferred path" error

Bob Ball Tue, 20 Sep 2016 10:29:05 -0700

Stabbing in the dark, but this sounds like a multipath problem. Perhapsyou have 2 or more paths to the storage, and one or more of them is downfor some reason, perhaps the hardware itself, perhaps a cable ispulled.... You could look for LEDs in a bad state.

I always find it instructive to reboot such a system and watch whatcomes up on the console during the startup.


bob

On 9/20/2016 12:29 PM, Joe Landman wrote:

On 09/20/2016 12:21 PM, Lewis Hyatt wrote:
We do not know if it's related, but this same OSS is in a very bad
state, with very high load average (200), very high I/O wait time, and
taking many seconds to respond to each read request, making the array
more or less unusable. That's the problem we are trying to fix.
This sounds like a storage system failure. Queuing up of IOs to drivethe load to 200 usually means something is broken elsewhere in thestack at a lower level. Not always ... sometimes you have users wholike to write several million/billion small ( < 100 byte ) files.
What does dmesg report? Try to do a pastebin/gist of it, and point itto the list.
Things that come to mind are
a) offlined RAID (most likely): This would explain the user load, andall sorts of strange messages about block devices and file systems inthe logs
b) A user DoS against the storage: usually someone writing many tinyfiles.
There are other possibilities, but these seem more likely.


_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] "Not on preferred path" error

Reply via email to