its more likely you run out of verbsRdmasPerNode which is the top limit across all connections for a given node.
Sven On Fri, Feb 24, 2017 at 11:31 AM Aaron Knister <[email protected]> wrote: Interesting, thanks Sven! Could "resources" I'm running out of include NSD server queues? On 2/23/17 12:12 PM, Sven Oehme wrote: > all this waiter shows is that you have more in flight than the node or > connection can currently serve. the reasons for that can be > misconfiguration or you simply run out of resources on the node, not the > connection. with latest code you shouldn't see this anymore for node > limits as the system automatically adjusts the number of maximum RDMA's > according to the systems Node capabilities : > > you should see messages in your mmfslog like : > > 2017-02-23_06:19:50.056-0800: [I] VERBS RDMA starting with > verbsRdmaCm=no verbsRdmaSend=yes verbsRdmaUseMultiCqThreads=yes > verbsRdmaUseCompVectors=yes > 2017-02-23_06:19:50.078-0800: [I] VERBS RDMA library libibverbs.so > (version >= 1.1) loaded and initialized. > 2017-02-23_06:19:50.078-0800: [I] VERBS RDMA verbsRdmasPerNode increased > from*_3072 to 3740 because verbsRdmasPerNodeOptimize is set to yes._* > 2017-02-23_06:19:50.121-0800: [I] VERBS RDMA discover mlx5_5 port 1 > transport IB link IB NUMA node 16 pkey[0] 0xFFFF gid[0] subnet > 0xFEC0000000000013 id 0xE41D2D0300FDB9CD state ACTIVE > 2017-02-23_06:19:50.137-0800: [I] VERBS RDMA discover mlx5_4 port 1 > transport IB link IB NUMA node 16 pkey[0] 0xFFFF gid[0] subnet > 0xFEC0000000000015 id 0xE41D2D0300FDB9CC state ACTIVE > 2017-02-23_06:19:50.153-0800: [I] VERBS RDMA discover mlx5_3 port 1 > transport IB link IB NUMA node 1 pkey[0] 0xFFFF gid[0] subnet > 0xFEC0000000000013 id 0xE41D2D0300FDB751 state ACTIVE > 2017-02-23_06:19:50.169-0800: [I] VERBS RDMA discover mlx5_2 port 1 > transport IB link IB NUMA node 1 pkey[0] 0xFFFF gid[0] subnet > 0xFEC0000000000015 id 0xE41D2D0300FDB750 state ACTIVE > 2017-02-23_06:19:50.185-0800: [I] VERBS RDMA discover mlx5_1 port 1 > transport IB link IB NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFEC0000000000013 id 0xE41D2D0300FDB78D state ACTIVE > 2017-02-23_06:19:50.201-0800: [I] VERBS RDMA discover mlx5_0 port 1 > transport IB link IB NUMA node 0 pkey[0] 0xFFFF gid[0] subnet > 0xFEC0000000000015 id 0xE41D2D0300FDB78C state ACTIVE > > we want to eliminate all this configurable limits eventually, but this > takes time, but as you can see above, we make progress on each release :-) > > Sven > > > > > On Thu, Feb 23, 2017 at 9:05 AM Aaron Knister <[email protected] > <mailto:[email protected]>> wrote: > > On a particularly heavy loaded NSD server I'm seeing a lot of these > messages: > > 0x7FFFF08B63E0 ( 15539) waiting 0.004139456 seconds, NSDThread: on > ThCond 0x7FFFA80772C8 (0x7FFFA80772C8) (VERBSEventWaitCondvar), reason > 'waiting for conn rdmas < conn maxrdmas' > 0x7FFFF08EED80 ( 15584) waiting 0.004075718 seconds, NSDThread: on > ThCond 0x7FFF680008F8 (0x7FFF680008F8) (VERBSEventWaitCondvar), reason > 'waiting for conn rdmas < conn maxrdmas' > 0x7FFFF08FDF00 ( 15596) waiting 0.003965504 seconds, NSDThread: on > ThCond 0x7FFF8C00E288 (0x7FFF8C00E288) (VERBSEventWaitCondvar), reason > 'waiting for conn rdmas < conn maxrdmas' > 0x7FFFF09185A0 ( 15617) waiting 0.003916346 seconds, NSDThread: on > ThCond 0x7FFF9000CB18 (0x7FFF9000CB18) (VERBSEventWaitCondvar), reason > 'waiting for conn rdmas < conn maxrdmas' > 0x7FFFF092B380 ( 15632) waiting 0.003659610 seconds, NSDThread: on > ThCond 0x1DB04B8 (0x1DB04B8) (VERBSEventWaitCondvar), reason 'waiting > for conn rdmas < conn maxrdmas' > > I've tried tweaking verbsRdmasPerConnection but the issue seems to > persist. Has anyone has encountered this and if so how'd you fix it? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 <tel:(301)%20286-2776> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
