Colin Faber wrote: > > > On 12/13/2010 11:54 AM, Gary Molenkamp wrote: >> I'm attempting to deploy a new lustre filesystem using lustre 1.8.5, but >> this is my first stab at incorporating an IB network. I've deployed >> several over tcp using 1.8.4 without issue, so I'm not sure if there is >> an IB configuration or a 1.8.5 issue here. Any assistance would be >> appreciated. >> >> This new cluster has two parallel networks: >> gige: 10.27.5.0/23 >> ib : 10.27.8.0/23 >> >> On the lfs servers and clients, lnet is configured as: >> options lnet networks=o2ib0(ib0),tcp0(ib0) > ^^^^^ > Why are you assigning two different network types to the same physical > device?
My assumption was that this indicated to lnet when IPoIB was to be used vs native IB, but by your question, I assume that is not the case. :) I retested with just options lnet networks=o2ib0(ib0) And the resulting error conditions below still hold true. >> The IB network is routable to 10/8 and clients mount other lustre >> filesystems using 1.8.4 over tcp. >> >> On the MDS (with an intended failover to a secondary) the mgs,mdt >> filesystem is created with: >> >> mkfs.lustre --fsname lfs --mdt --mgs \ >> --mkfsoptions='-i 1024 -I 512' \ >> --failnode=10.27.9....@o2ib0 --failnode=10.27.9....@o2ib0 \ >> --mountfsoptions=iopen_nopriv,user_xattr,errors=remount-ro,acl \ >> /dev/sda >> >> However, this mount then fails with: >> >> mount.lustre: mount /dev/sda at /data/mds failed: Cannot assign >> requested address >> >> An lctl shows the proper nids: >> 10.27.9....@o2ib >> 10.27.9....@tcp >> >> Dmesg shows a parsing error with the o2ib0 nid: >> >> LustreError: 159-d: Can't parse NID 'failover.node=10.27.9....@o2ib0' >> Lustre: Denying initial registration attempt from nid 10.27.9....@o2ib, >> specified as failover >> LustreError: 9571:0:(obd_mount.c:1097:server_start_targets()) Required >> registration failed for lfs-MDT0000: -99 >> >> Am I specifying the failover incorrectly? What should it be when using >> o2ib as the primary interconnect. If I remove the failover parameters >> using tunefs.lustre the mount succeeds, but clients cannot connect to >> the mdt. >> >> -- Gary Molenkamp SHARCNET Systems Administrator University of Western Ontario Compute/Calcul Canada http://www.computecanada.org [email protected] http://www.sharcnet.ca (519) 661-2111 x88429 (519) 661-4000 _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
