Hi Gabriele, mine is one of many crash tests I'm doing, simulating the interruption of the IB channel. I wanted to see if it was possible to switch from IB to Ethernet quickly, I do not want them in load balancing. Of course under normal conditions I use only IB Bye Stefano Elmopi > > Hi Stefano, > why do you want to use a Ethernet slow connection in load balancing with > an highperformance infiniband? > > bye > > On 07/12/2010 05:05 PM, Kevin Van Maren wrote: >> Stefano Elmopi wrote: >> >>> Hi, >>> >>> I have a Lustre file system, consisting of a MGS/MDS an two OSS, >>> interconnected with Infiniband. >>> The version of Lustre is 1.8.3 and the SO of the servers is CentOS 5.4 >>> and I used >>> the following commands to their formatting: >>> >>> MGS/MDS: >>> mkfs.lustre --mgs /dev/mpath/mpath1 >>> mount -t lustre /dev/mpath/mpath1 /MGS >>> mkfs.lustre --mdt --fsname=lustre01 >>> --mgsnode=172.16.100....@tcp0,192.168.15...@o2ib0 >>> --mgsnode=172.16.100....@tcp0,192.168.150...@o2ib0 >>> --failnode=172.16.100....@tcp0,192.168.150...@o2ib0 /dev/mpath/mpath2 >>> mount -t lustre /dev/mpath/mpath2 /MDS_1/ >>> >>> OSS_1 >>> mkfs.lustre --ost --fsname=lustre01 >>> --failnode=172.16.100....@tcp0,192.168.150...@o2ib0 >>> --mgsnode=172.16.100....@tcp0,192.168.15...@o2ib0 >>> --mgsnode=172.16.100....@tcp0,192.168.150...@o2ib0 /dev/mpath/mpath1 >>> mount -t lustre /dev/mpath/mpath1 /LUSTRE_1 >>> >>> OSS_2 >>> mkfs.lustre --ost --fsname=lustre01 >>> --failnode=172.16.100....@tcp0,192.168.150...@o2ib0 >>> --mgsnode=172.16.100....@tcp0,192.168.15...@o2ib0 >>> --mgsnode=172.16.100....@tcp0,192.168.150...@o2ib0 /dev/mpath/mpath2 >>> mount -t lustre /dev/mpath/mpath2 /LUSTRE_1 >>> >>> and then there are two clients mounted, one on Ethernet and one on IB. >>> I disconnected the IB cable to simulate the breaking of the IB card on >>> OSS_2. >>> I modified the file modprobe.conf to start LNET with only Ethernet >>> card and then mount Lustre >>> filesystem and the operation seems to be successful, the ethernet >>> client can see the entire filesystem. >>> >> You modified OSS_2 to be Ethernet only, right? (As opposed to the client) >> >> >>> The problem comes when I try to force a write on OSS_2 because writing >>> crashes ,and the operation goes wrong. >>> >> Yes, because the MDS is using InfiniBand, and is trying to access the >> OST over IB. Since the OST has an IB NID, the MDS is trying to use that >> NID to talk to it: you would have to disable IB on the MDS node as well. >> >> >>> Log on MGS/MDS: >>> >>> Jul 12 15:04:59 mdt01prdpom kernel: LustreError: >>> 4238:0:(events.c:66:request_out_callback()) @@@ type 4, status -113 >>> r...@ffff81013ea52000 x1340531260082684/t0 >>> o8->[email protected]@tcp:28/4 lens 368/584 e 0 to >>> 1 dl 1278939908 ref 2 fl Rpc:N/0/0 rc 0/0 >>> Jul 12 15:04:59 mdt01prdpom kernel: LustreError: >>> 4238:0:(events.c:66:request_out_callback()) Skipped 16 previous >>> similar messages >>> Jul 12 15:06:07 mdt01prdpom kernel: LustreError: >>> 4237:0:(lov_request.c:690:lov_update_create_set()) error creating fid >>> 0x10f8004 sub-object on OST idx 1/1: rc = -11 >>> Jul 12 15:06:07 mdt01prdpom kernel: LustreError: >>> 4237:0:(lov_request.c:690:lov_update_create_set()) Skipped 1 previous >>> similar message >>> Jul 12 15:06:07 mdt01prdpom kernel: LustreError: >>> 4408:0:(mds_open.c:441:mds_create_objects()) error creating objects >>> for inode 17793028: rc = -5 >>> Jul 12 15:06:07 mdt01prdpom kernel: LustreError: >>> 4408:0:(mds_open.c:826:mds_finish_open()) mds_create_objects: rc = -5 >>> >>> >>> My question is: >>> >>> you can mount the server OSS_2 so that it can provide service with the >>> ethernet card ? >>> If yes, What should I do? >>> >>> >>> Thanks >>> >>> >>> >>> Ing. Stefano Elmopi >>> Gruppo Darco - Resp. ICT Sistemi >>> Via Ostiense 131/L Corpo B, 00154 Roma >>> >> Remember that for any node accessing another node it will always use the >> "best" NID they have in common, even if it doesn't work (Lustre assumes >> all networks on a server will always work -- the resource will be failed >> over to a healthy server). >> >> If you really want to try this, see an example here: >> https://bugzilla.lustre.org/show_bug.cgi?id=19854 for a hack in >> specifying the NIDs as belonging to different servers. Note that the >> servers only track a single NID, so they will not be able to do >> callbacks if the network path to the client goes down (ie, they will >> evict the clients, although the clients can reconnect over the other >> network). >> >> The "better" approach is generally to us bonding to provide multiple >> physical links that look like a single network to Lustre. Ethernet >> bonding works without additional patches. See also bug 20153 and 20288 >> for more patches for Lustre with ib-bonding. There is an additional >> non-landed patch in bug private bug 22065. >> >> Kevin >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > > > -- > _Gabriele Paciucci_ http://www.linkedin.com/in/paciucci > > Pursuant to legislative Decree n. 196/03 you are hereby informed that this > email contains confidential information intended only for use of addressee. > If you are not the addressee and have received this email by mistake, please > send this email to the sender. You may not copy or disseminate this message > to anyone. Thank You. > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
