Hi Thomas, nice to see you have remained active in the Lustre community. To your question, I don't have an answer, but it seems like the timeout may be masking the root issue - perhaps a system or network issue - I always start with hostname resolution. :) On Oct 24, 2017 11:08 AM, "Thomas Roth" <[email protected]> wrote:
> Sorry to have bothered you - works now. > > I have set /sys/fs/lustre/timeout=3000, quite brutally, to make things go > verrry slowly, and after 25 minutes the mount was there. > > Which control aka timeout-parameter _should_ I have tuned instead in such > a situation? > > Regards, > Thomas > > On 10/24/2017 06:26 PM, Thomas Roth wrote: > >> Hi all, >> >> in a Lustre 2.10, CentOS 7.4 test system, I have a pair of MDS, format >> command was >> >> > mkfs.lustre --mgs --mdt --fsname=test --index=0 >> --servicenode=10.20.1.198@o2ib5 --servicenode=10.20.1.199@o2ib5 >> --mgsnode=10.20.1.198@o2ib5 --mgsnode=10.20.1.199@o2ib5 >> /dev/drbd0 >> >> I added some OSS and clients, everything working. >> >> Then I switched off 10.20.1.198 and mounted my MGS/MDT on 10.20.1.199. >> All OSS and clients connected, everything working. >> >> Now I try to add a client that was never there before, >> > mount -t lustre 10.20.1.198@o2ib5:10.20.1.199@o2ib5:/test >> /lustre/test >> >> But this client only tries to connect to 10.20.1.198@o2ib5 - and fails. >> The log says >> >> LNet: 47655:0:(o2iblnd_cb.c:2672:kiblnd_check_reconnect()) >> 10.20.1.198@o2ib5: reconnect (invalid service id), 12, 12, msg_size: >> 4096, queue_depth: 8/-1, max_frags: 256/-1 >> LNet: 47655:0:(o2iblnd_cb.c:2698:kiblnd_rejected()) 10.20.1.198@o2ib5 >> rejected: no listener at 987 >> ... >> LustreError: 48560:0:(mgc_request.c:251:do_config_log_add()) >> MGC10.20.1.198@o2ib5: failed processing log, type 1: rc = -5 >> LNet: 48427:0:(o2iblnd_cb.c:3207:kiblnd_check_conns()) Timed out tx for >> 10.20.1.198@o2ib5: 4301501 seconds >> Lustre: 48441:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request >> sent has failed due to network error: [sent 1508861258/real 1508861264] >> req@ffff88103dc78000 x1582155623825424/t0(0) o250->MGC10.20.1.198@o2ib5 >> @10.20.1.198@o2ib5:26/25 lens 520/544 e 0 to 1 dl 1508861408 ref 1 fl >> Rpc:eXN/0/ffffffff rc 0/-1 >> >> >> all of which seems logical but not wanted - where is my 10.20.1.199@o2ib5 >> ? >> >> Of course I can 'lctl ping 10.20.1.199@o2ib5'. >> And I have since umounted on one of the older clients, unloaded the >> Lustre modules, and mounted again - works. >> >> >> Regards, >> Thomas >> >> > -- > -------------------------------------------------------------------- > Thomas Roth > Department: Informationstechnologie > Location: SB3 1.250 > Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 > > GSI Helmholtzzentrum für Schwerionenforschung GmbH > Planckstraße 1 > 64291 Darmstadt > www.gsi.de > > Gesellschaft mit beschränkter Haftung > Sitz der Gesellschaft: Darmstadt > Handelsregister: Amtsgericht Darmstadt, HRB 1528 > > Geschäftsführung: Ursula Weyrich > Professor Dr. Paolo Giubellino > Jörg Blaurock > > Vorsitzende des Aufsichtsrates: St Dr. Georg Schütte > Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt > > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
