Re: [lustre-discuss] "Not on preferred path" error

Tao, Zhiqi Wed, 21 Sep 2016 10:37:59 -0700

It appears that there is only one SAS path to the back storage, which explained 
why some of LUN showed on non-preferred path.


Typically we recommend to have two SAS connections from each OSS to the 
storage. One connects to the upper controller and one connects to the lower 
controller. Then, distributed LUNs between two controllers. In the event of SAS 
connection failure, all LUNs would failover to one controller. The one used to 
go through the other controller would shows that they are not on the preferred 
path. As this kind of failover happened on the multipath layer, it's 
transparent to Lustre. The file system continues to run as you observed. 

Best Regards,
Zhiqi

-----Original Message-----
From: lustre-discuss [mailto:[email protected]] On Behalf 
Of Lewis Hyatt
Sent: Tuesday, September 20, 2016 12:53 PM
To: Ben Evans <[email protected]>; [email protected]
Subject: Re: [lustre-discuss] "Not on preferred path" error

I see, thanks. This is what we see from running multipath cmds... i don't see 
anything that means anything to me, but FWIW it looks the same as on our other 
OSS that is working ok.

$multipath -ll
map03 (360080e50002ee5100000023f50092c6c) dm-13 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:1:3 sdk 8:160 [active][ready]
map02 (360080e50002ee4100000024250092c11) dm-12 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:1:2 sdj 8:144 [active][ready]
map01 (360080e50002ee5100000023b50092c4c) dm-11 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:1:1 sdi 8:128 [active][ready]
map00 (360080e50002ee4100000023e50092bf2) dm-10 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:1:0 sdh 8:112 [active][ready]
map09 (360080e50002ee4dc000002f250092c62) dm-7 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:0:3 sde 8:64  [active][ready]
map11 (360080e50002ee4dc000002f650092c84) dm-9 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:0:5 sdg 8:96  [active][ready]
map08 (360080e50002ec890000002e550092a07) dm-6 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:0:2 sdd 8:48  [active][ready]
map10 (360080e50002ec890000002e950092a27) dm-8 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:0:4 sdf 8:80  [active][ready]
map07 (360080e50002ee4dc000002ee50092c44) dm-5 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:0:1 sdc 8:32  [active][ready]
map06 (360080e50002ec890000002e1500929e9) dm-4 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][active]
  \_ 3:0:0:0 sdb 8:16  [active][ready]
map05 (360080e50002ee5100000024350092c8c) dm-15 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:1:5 sdm 8:192 [active][ready]
map04 (360080e50002ee4100000024650092c31) dm-14 LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=1][enabled]
  \_ 3:0:1:4 sdl 8:176 [active][ready]

===========

$multipath -r
reload: map06 (360080e50002ec890000002e1500929e9)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:0 sdb 8:16  [active][ready]
reload: map07 (360080e50002ee4dc000002ee50092c44)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:1 sdc 8:32  [active][ready]
reload: map08 (360080e50002ec890000002e550092a07)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:2 sdd 8:48  [active][ready]
reload: map09 (360080e50002ee4dc000002f250092c62)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:3 sde 8:64  [active][ready]
reload: map10 (360080e50002ec890000002e950092a27)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:4 sdf 8:80  [active][ready]
reload: map11 (360080e50002ee4dc000002f650092c84)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:0:5 sdg 8:96  [active][ready]
reload: map00 (360080e50002ee4100000023e50092bf2)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:1:0 sdh 8:112 [active][ready]
reload: map01 (360080e50002ee5100000023b50092c4c)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:1:1 sdi 8:128 [active][ready]
reload: map02 (360080e50002ee4100000024250092c11)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:1:2 sdj 8:144 [active][ready]
reload: map03 (360080e50002ee5100000023f50092c6c)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:1:3 sdk 8:160 [active][ready]
reload: map04 (360080e50002ee4100000024650092c31)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:1:4 sdl 8:176 [active][ready]
reload: map05 (360080e50002ee5100000024350092c8c)  LSI,VirtualDisk 
[size=15T][features=0][hwhandler=0][n/a]
\_ round-robin 0 [prio=1][undef]
  \_ 3:0:1:5 sdm 8:192 [active][ready]

Thanks again for the assistance all, I really appreciate it.

-lewis


On 9/20/16 2:48 PM, Ben Evans wrote:
> multipath is a linux utility which handles communications from the 
> server to the disk array.  It is independent of Lustre or Infiniband.  
> For OSSes, each OSS had 2 connections to each storage array it 
> communicated with, usually there were a pair of arrays per OSS pair 
> (except for in a rare handful of our systems which had 1).
>
> -Ben Evans
>
> On 9/20/16, 2:33 PM, "lustre-discuss on behalf of Lewis Hyatt"
> <[email protected] on behalf of 
> [email protected]>
> wrote:
>
>> Thanks so much for the information, we will look into this asap.
>> Forgive my ignorance, but is multipath here referring to some 
>> lustre-specific or infiniband-related process? Not familiar with it 
>> in this context.
>> Thanks again.
>>
>> -lewis
>>
>>
>> On 9/20/16 2:24 PM, Ben Evans wrote:
>>> Lewis,
>>>
>>> Yes, "Not on preferred path" is something that bubbles up through 
>>> the TS gui from multipath.
>>>
>>> A simple thing you can check is running multipath -ll on the OSS 
>>> (and it's
>>> peer) in question and seeing if it reports that one or more path is 
>>> down.
>>> If it's just on one OSS, try running 'multipath -r'.  If it doesn't 
>>> come back and look OK, then it's most likely a cable issue, and you 
>>> can try re-seating it to see if it helps.  It's been a long time 
>>> since I diagnosed this, though and can't remember the details of how 
>>> to associate cables with paths, though there should be indicator 
>>> lights on the back of everything and the path that is down should be 
>>> red.
>>>
>>> The high load is probably associated with the cable issue, since 
>>> you're putting more strain on one path.
>>>
>>> -Ben Evans
>>>
>>> On 9/20/16, 12:21 PM, "lustre-discuss on behalf of Lewis Hyatt"
>>> <[email protected] on behalf of 
>>> [email protected]>
>>> wrote:
>>>
>>>> Hello-
>>>>
>>>> I am having an issue with a lustre 1.8 array that I have little 
>>>> hope of figuring out on my own, so I thought I would try here to 
>>>> see if anyone might know what this warning/error means. Our array 
>>>> was built by Terascala, which no longer exists, so we have no 
>>>> support for it and little documentation (and not much in-house 
>>>> knowledge). I see this complaint "Not on preferred path" on the GUI 
>>>> that we have, which I assume was something custom made by 
>>>> Terascala, and I am not sure even what path it is referring to; we 
>>>> use infiniband for all connections and it could relate to this, but 
>>>> not sure. We see this error on 3 of the 12 OSTs. More specifically, 
>>>> we have 2 OSSs, each handling 6 OSTs, and all 3 of the "not on optimal 
>>>> path" OSTs are on the same OSS.
>>>>
>>>> We do not know if it's related, but this same OSS is in a very bad 
>>>> state, with very high load average (200), very high I/O wait time, 
>>>> and taking many seconds to respond to each read request, making the 
>>>> array more or less unusable. That's the problem we are trying to fix.
>>>>
>>>> I realize there's not much hope for anyone to help us with that 
>>>> given how little information I am able to provide. But I was hoping 
>>>> someone out there might know what this "not on optimal path" error 
>>>> means, and if it matters for anything or not, so we have somewhere to 
>>>> start.
>>>> Thanks very much!
>>>>
>>>> I could provide screen shots of the management GUI we have, if it 
>>>> would be informative.
>>>>
>>>> -Lewis
>>>> _______________________________________________
>>>> lustre-discuss mailing list
>>>> [email protected]
>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>
>> _______________________________________________
>> lustre-discuss mailing list
>> [email protected]
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] "Not on preferred path" error

Reply via email to