Thanks so much for the information, we will look into this asap.
Forgive my ignorance, but is multipath here referring to some lustre-specific or infiniband-related process? Not familiar with it in this context. Thanks again.


-lewis


On 9/20/16 2:24 PM, Ben Evans wrote:
Lewis,

Yes, "Not on preferred path" is something that bubbles up through the TS
gui from multipath.

A simple thing you can check is running multipath -ll on the OSS (and it's
peer) in question and seeing if it reports that one or more path is down.
If it's just on one OSS, try running 'multipath -r'.  If it doesn't come
back and look OK, then it's most likely a cable issue, and you can try
re-seating it to see if it helps.  It's been a long time since I diagnosed
this, though and can't remember the details of how to associate cables
with paths, though there should be indicator lights on the back of
everything and the path that is down should be red.

The high load is probably associated with the cable issue, since you're
putting more strain on one path.

-Ben Evans

On 9/20/16, 12:21 PM, "lustre-discuss on behalf of Lewis Hyatt"
<lustre-discuss-boun...@lists.lustre.org on behalf of lhy...@gmail.com>
wrote:

Hello-

I am having an issue with a lustre 1.8 array that I have little hope
of figuring out on my own, so I thought I would try here to see if
anyone might know what this warning/error means. Our array was built
by Terascala, which no longer exists, so we have no support for it and
little documentation (and not much in-house knowledge). I see this
complaint "Not on preferred path" on the GUI that we have, which I
assume was something custom made by Terascala, and I am not sure even
what path it is referring to; we use infiniband for all connections
and it could relate to this, but not sure. We see this error on 3 of
the 12 OSTs. More specifically, we have 2 OSSs, each handling 6 OSTs,
and all 3 of the "not on optimal path" OSTs are on the same OSS.

We do not know if it's related, but this same OSS is in a very bad
state, with very high load average (200), very high I/O wait time, and
taking many seconds to respond to each read request, making the array
more or less unusable. That's the problem we are trying to fix.

I realize there's not much hope for anyone to help us with that given
how little information I am able to provide. But I was hoping someone
out there might know what this "not on optimal path" error means, and
if it matters for anything or not, so we have somewhere to start.
Thanks very much!

I could provide screen shots of the management GUI we have, if it
would be informative.

-Lewis
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to