Hi Philipp- I don't do this a ton so I'm hazy, but do you set nids or nets when you mkfs.lustre? So then maybe you have to tunefs those in when you add more?
-Laura ________________________________________ Od: lustre-discuss <[email protected]> v imenu Philipp Grau <[email protected]> Poslano: sreda, 29. november 2023 06:37 Za: [email protected] Zadeva: [lustre-discuss] Lustre mds/ods Server with IB/omnipath and Ethernet clients (dual homed?) Hello, some questions regarding network connection setup for ethernet based clients. We have a working Luste installation with two MDS servers and seven ODS systems connected to our cluster via omnipath/ib. This part is working fine. Now we want to add some clients that have only a ethernet connection to the Lustre servers (with the ethernet cards in the servers). Our MDS and ODS servers have the following lnet setup: net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: o2ib local NI(s): - nid: 10.149.0.XXX@o2ib # IP of the local ib interface status: up interfaces: 0: ib0 - net type: tcp local NI(s): - nid: xxx.xxx.5.XXX@tcp # IP of the local ethernet interface status: up interfaces: 0: eno1 Our test ethernet node: lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: xxx.xxx.4.XXX@tcp # same subnet as above, it is a /23 status: up interfaces: 0: enp225s0f0 So far so good. I'm able to lnetctl ping in both directions: Ping the client: lnetctl ping xxx.xxx.4.xxx@tcp ping: - primary nid: xxx.xxx.4.xxx@tcp Multi-Rail: True peer ni: - nid: xxx.xxx.4.xxx@tcp Ping the server: lnetctl ping xxx.xxx.5.xxx@tcp ping: - primary nid: xxx.xxx.5.xxx@tcp Multi-Rail: True peer ni: - nid: 10.149.0.183@o2ib - nid: xxx.xxx.5.xxx@tcp But the mount fails, output from dmesg (are there other sources of debug information?): LustreError: 25758:0:(ldlm_lib.c:494:client_obd_setup()) can't add initial connection LustreError: 25758:0:(obd_config.c:559:class_setup()) setup scratch-MDT0000-mdc-ffff8b63003d4000 failed (-2) LustreError: 25758:0:(obd_config.c:1835:class_config_llog_handler()) MGCxxx.xxx.5.xxx@tcp: cfg command failed: rc = -2 Lustre: cmd=cf003 0:scratch-MDT0000-mdc 1:scratch-MDT0000_UUID 2:10.149.0.183@o2ib LustreError: 15c-8: MGC160.45.5.246@tcp: The configuration from log 'scratch-client' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 25734:0:(obd_config.c:610:class_cleanup()) Device 3 not setup Lustre: Unmounted scratch-client LustreError: 25734:0:(obd_mount.c:1604:lustre_fill_super()) Unable to mount (-2) Does some one have some ideas or reference documentation on this topic? Do I need some "lnetctl route" stuff? Do I need some "lnetctl peer add ..." to make the Lustre servers and clients known to each other? Any hints are welcome! Kind regards, Philipp -- Philipp Grau | Freie Universitaet Berlin [email protected] | FU-IT - Infrastruktur Tel: +49 (30) 838 56583 | Fabeckstr. 32 Fax: +49 (30) 838 56721 | 14195 Berlin _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
