Hi Gabriele,

mine is one of many crash tests I'm doing, simulating the interruption of the 
IB channel.
I wanted to see if it was possible to switch from IB to Ethernet quickly, I do 
not want them
in load balancing.
Of course under normal conditions I use only IB

Bye

Stefano Elmopi



> 
> Hi Stefano,
> why do you want to use a Ethernet slow connection in load balancing with 
> an highperformance infiniband?
> 
> bye
> 
> On 07/12/2010 05:05 PM, Kevin Van Maren wrote:
>> Stefano Elmopi wrote:
>> 
>>> Hi,
>>> 
>>> I have a Lustre file system, consisting of a MGS/MDS an two OSS,
>>> interconnected with Infiniband.
>>> The version of Lustre is 1.8.3 and the SO of the servers is CentOS 5.4
>>> and I used
>>> the following commands to their formatting:
>>> 
>>> MGS/MDS:
>>> mkfs.lustre --mgs /dev/mpath/mpath1
>>> mount -t lustre /dev/mpath/mpath1 /MGS
>>> mkfs.lustre --mdt --fsname=lustre01
>>> --mgsnode=172.16.100....@tcp0,192.168.15...@o2ib0
>>> --mgsnode=172.16.100....@tcp0,192.168.150...@o2ib0
>>> --failnode=172.16.100....@tcp0,192.168.150...@o2ib0 /dev/mpath/mpath2
>>> mount -t lustre /dev/mpath/mpath2 /MDS_1/
>>> 
>>> OSS_1
>>> mkfs.lustre --ost --fsname=lustre01
>>> --failnode=172.16.100....@tcp0,192.168.150...@o2ib0
>>> --mgsnode=172.16.100....@tcp0,192.168.15...@o2ib0
>>> --mgsnode=172.16.100....@tcp0,192.168.150...@o2ib0 /dev/mpath/mpath1
>>> mount -t lustre /dev/mpath/mpath1 /LUSTRE_1
>>> 
>>> OSS_2
>>> mkfs.lustre --ost --fsname=lustre01
>>> --failnode=172.16.100....@tcp0,192.168.150...@o2ib0
>>> --mgsnode=172.16.100....@tcp0,192.168.15...@o2ib0
>>> --mgsnode=172.16.100....@tcp0,192.168.150...@o2ib0 /dev/mpath/mpath2
>>> mount -t lustre /dev/mpath/mpath2 /LUSTRE_1
>>> 
>>> and then there are two clients mounted, one on Ethernet and one on IB.
>>> I disconnected the IB cable to simulate the breaking of the IB card on
>>> OSS_2.
>>> I modified the file modprobe.conf to start LNET with only Ethernet
>>> card and then mount Lustre
>>> filesystem and the operation seems to be successful, the ethernet
>>> client can see the entire filesystem.
>>> 
>> You modified OSS_2 to be Ethernet only, right?  (As opposed to the client)
>> 
>> 
>>> The problem comes when I try to force a write on OSS_2 because writing
>>> crashes ,and the operation goes wrong.
>>> 
>> Yes, because the MDS is using InfiniBand, and is trying to access the
>> OST over IB.  Since the OST has an IB NID, the MDS is trying to use that
>> NID to talk to it: you would have to disable IB on the MDS node as well.
>> 
>> 
>>> Log on MGS/MDS:
>>> 
>>> Jul 12 15:04:59 mdt01prdpom kernel: LustreError:
>>> 4238:0:(events.c:66:request_out_callback()) @@@ type 4, status -113
>>>  r...@ffff81013ea52000 x1340531260082684/t0
>>> o8->[email protected]@tcp:28/4 lens 368/584 e 0 to
>>> 1 dl 1278939908 ref 2 fl Rpc:N/0/0 rc 0/0
>>> Jul 12 15:04:59 mdt01prdpom kernel: LustreError:
>>> 4238:0:(events.c:66:request_out_callback()) Skipped 16 previous
>>> similar messages
>>> Jul 12 15:06:07 mdt01prdpom kernel: LustreError:
>>> 4237:0:(lov_request.c:690:lov_update_create_set()) error creating fid
>>> 0x10f8004 sub-object on OST idx 1/1: rc = -11
>>> Jul 12 15:06:07 mdt01prdpom kernel: LustreError:
>>> 4237:0:(lov_request.c:690:lov_update_create_set()) Skipped 1 previous
>>> similar message
>>> Jul 12 15:06:07 mdt01prdpom kernel: LustreError:
>>> 4408:0:(mds_open.c:441:mds_create_objects()) error creating objects
>>> for inode 17793028: rc = -5
>>> Jul 12 15:06:07 mdt01prdpom kernel: LustreError:
>>> 4408:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5
>>> 
>>> 
>>> My question is:
>>> 
>>> you can mount the server OSS_2 so that it can provide service with the
>>> ethernet card ?
>>> If yes, What should I do?
>>> 
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> Ing. Stefano Elmopi
>>> Gruppo Darco - Resp. ICT Sistemi
>>> Via Ostiense 131/L Corpo B, 00154 Roma
>>> 
>> Remember that for any node accessing another node it will always use the
>> "best" NID they have in common, even if it doesn't work (Lustre assumes
>> all networks on a server will always work -- the resource will be failed
>> over to a healthy server).
>> 
>> If you really want to try this, see an example here:
>> https://bugzilla.lustre.org/show_bug.cgi?id=19854 for a hack in
>> specifying the NIDs as belonging to different servers.  Note that the
>> servers only track a single NID, so they will not be able to do
>> callbacks if the network path to the client goes down (ie, they will
>> evict the clients, although the clients can reconnect over the other
>> network).
>> 
>> The "better" approach is generally to us bonding to provide multiple
>> physical links that look like a single network to Lustre.  Ethernet
>> bonding works without additional patches.  See also bug 20153 and 20288
>> for more patches for Lustre with ib-bonding.  There is an additional
>> non-landed patch in bug private bug 22065.
>> 
>> Kevin
>> 
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> [email protected]
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
>> 
> 
> 
> -- 
> _Gabriele Paciucci_ http://www.linkedin.com/in/paciucci
> 
> Pursuant to legislative Decree n. 196/03 you are hereby informed that this 
> email contains confidential information intended only for use of addressee. 
> If you are not the addressee and have received this email by mistake, please 
> send this email to the sender. You may not copy or disseminate this message 
> to anyone. Thank You.
> 
> 

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to