Yep, looks like that's indeed the issue. Reducing peer_credits to 42 makes the problem go away.
Thanks, Kevin On Thu, Feb 13, 2020 at 4:25 PM <lustre-discuss-requ...@lists.lustre.org> wrote: > Send lustre-discuss mailing list submissions to > lustre-discuss@lists.lustre.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > or, via email, send a message with subject or body 'help' to > lustre-discuss-requ...@lists.lustre.org > > You can reach the person managing the list at > lustre-discuss-ow...@lists.lustre.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of lustre-discuss digest..." > > > Today's Topics: > > 1. Re: Lustre 2.12.3 client can't mount filesystem (Weiss, Karsten) > 2. Re: Lustre 2.12.3 client can't mount filesystem > (Kevin M. Hildebrand) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 13 Feb 2020 08:11:08 +0000 > From: "Weiss, Karsten" <karsten.we...@atos.net> > To: "lustre-discuss@lists.lustre.org" > <lustre-discuss@lists.lustre.org> > Subject: Re: [lustre-discuss] Lustre 2.12.3 client can't mount > filesystem > Message-ID: <cd1d4d54bbb4499998867447d1b8b...@atos.net> > Content-Type: text/plain; charset="us-ascii" > > Hi, > > this is probably https://jira.whamcloud.com/browse/LU-12901 which is > still open and was just postponed to Lustre 2.14.0. > > Reducing peer_credits to 42 is a workaround. > > Best regards, > Karsten > > From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> On Behalf > Of Andreas Dilger > Sent: Wednesday, February 12, 2020 21:50 > To: Kevin M. Hildebrand <ke...@umd.edu> > Cc: lustre-discuss@lists.lustre.org > Subject: Re: [lustre-discuss] Lustre 2.12.3 client can't mount filesystem > > Can you please try 2.12.4, it was just released yesterday and has a number > of fixes. > > > On Feb 12, 2020, at 13:36, Kevin M. Hildebrand <ke...@umd.edu<mailto: > ke...@umd.edu>> wrote: > > I just updated some of my clients to RHEL 7.7, Lustre 2.12.3, MOFED 4.7. > Server version is 2.10.8. > > I'm now getting errors mounting the filesystem on the client. In fact, I > can't even do an 'lctl ping' to any of the servers without getting an I/O > error. > > Debug logs show this message when I attempt an lctl ping: > 00000800:00020000:0.0:1581538955.090767:0:20471:0:(o2iblnd.c:941:kiblnd_create_conn()) > Can't create QP: -12, send_wr: 32634, recv_wr: 254, send_sge: 2, recv_sge: 1 > > # lctl list_nids > 10.11.80.65@o2ib3<mailto:10.11.80.65@o2ib3> > # lctl ping 10.11.80.50@o2ib3<mailto:10.11.80.50@o2ib3> > failed to ping 10.11.80.50@o2ib3<mailto:10.11.80.50@o2ib3>: Input/output > error > > Interestingly, if I do an 'lctl ping' to the client _from_ the server, the > ping succeeds, and from that point on pings from client _to_ server work > fine until the client is rebooted or lnet is reloaded. > > ko2iblnd parameters match on clients and servers, namely: > options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024 > concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 > fmr_flush_trigger=512 fmr_cache=1 > > Anyone have any thoughts? > > Thanks, > Kevin > > -- > Kevin Hildebrand > University of Maryland > Division of IT > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > Cheers, Andreas > -- > Andreas Dilger > Principal Lustre Architect > Whamcloud > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200213/4ba9d033/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Thu, 13 Feb 2020 08:24:30 -0500 > From: "Kevin M. Hildebrand" <ke...@umd.edu> > To: Andreas Dilger <adil...@whamcloud.com> > Cc: "lustre-discuss@lists.lustre.org" > <lustre-discuss@lists.lustre.org> > Subject: Re: [lustre-discuss] Lustre 2.12.3 client can't mount > filesystem > Message-ID: > < > cajmu7qmamoymb5zavymefpni2p2qxoktczm2bzkdbrzb9tn...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Ok, I just tried 2.12.4, and the problem still persists. The only > difference I see now is that the error messages are appearing in syslog > instead of needing to pull them from the debug log. > [ 230.413761] LNetError: 1423:0:(o2iblnd.c:941:kiblnd_create_conn()) Can't > create QP: -12, send_wr: 32634, recv_wr: 254, send_sge: 2, recv_sge: 1 > > Thanks, > Kevin > > On Wed, Feb 12, 2020 at 3:50 PM Andreas Dilger <adil...@whamcloud.com> > wrote: > > > Can you please try 2.12.4, it was just released yesterday and has a > number > > of fixes. > > > > On Feb 12, 2020, at 13:36, Kevin M. Hildebrand <ke...@umd.edu> wrote: > > > > I just updated some of my clients to RHEL 7.7, Lustre 2.12.3, MOFED 4.7. > > Server version is 2.10.8. > > > > I'm now getting errors mounting the filesystem on the client. In fact, I > > can't even do an 'lctl ping' to any of the servers without getting an I/O > > error. > > > > Debug logs show this message when I attempt an lctl ping: > > > 00000800:00020000:0.0:1581538955.090767:0:20471:0:(o2iblnd.c:941:kiblnd_create_conn()) > > Can't create QP: -12, send_wr: 32634, recv_wr: 254, send_sge: 2, > recv_sge: 1 > > > > # lctl list_nids > > 10.11.80.65@o2ib3 > > # lctl ping 10.11.80.50@o2ib3 > > failed to ping 10.11.80.50@o2ib3: Input/output error > > > > Interestingly, if I do an 'lctl ping' to the client _from_ the server, > the > > ping succeeds, and from that point on pings from client _to_ server work > > fine until the client is rebooted or lnet is reloaded. > > > > ko2iblnd parameters match on clients and servers, namely: > > options ko2iblnd peer_credits=128 peer_credits_hiw=64 credits=1024 > > concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 > > fmr_flush_trigger=512 fmr_cache=1 > > > > Anyone have any thoughts? > > > > Thanks, > > Kevin > > > > -- > > Kevin Hildebrand > > University of Maryland > > Division of IT > > _______________________________________________ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Principal Lustre Architect > > Whamcloud > > > > > > > > > > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/attachments/20200213/452b1c88/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > ------------------------------ > > End of lustre-discuss Digest, Vol 167, Issue 14 > *********************************************** >
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org