On 08/01/2016 06:33 AM, Thomas Roth wrote: > Hi all, > > is there a kind of a rule of thumb for the "min" number in > /proc/sys/lnet/peers?
No, there is no rule of thumb. It depends on too many factors in the system. In my experience, numbers like you are showing here are completely normal. The "min" field can be useful in context of problems that are occurring, but even then you need to have some way to know when the min occurred to corrolate it with whatever other issue is happening. There is work under way on master to allow zeroing out those fields for just that purpose. You are watching the min actively and see the numbers suddenly spike and corrolate that with some higher level issue, that can be useful. If you are not seeing any issues in the system, there is no need to be concerned about the numbers you posted. > Our ko2iblnd peer_credits-Parameter is at the default value, obviously 8. > > When I look up /proc/sys/lnet/peers (on an OSS), I typically get > something like > > > 10.20.1.76@o2ib1 1 NA -1 8 8 8 8 -18 0 > 10.20.0.188@o2ib1 1 NA -1 8 8 8 8 -29 0 > 10.20.0.44@o2ib1 2 NA -1 8 8 8 7 -15 72 > 10.20.1.165@o2ib1 1 NA -1 8 8 8 8 -18 0 > 10.20.1.21@o2ib1 1 NA -1 8 8 8 8 -2113 0 > 10.20.0.133@o2ib1 1 NA -1 8 8 8 8 -28 0 > 10.20.1.110@o2ib1 1 NA -1 8 8 8 8 -10 0 > 10.20.0.222@o2ib1 1 NA -1 8 8 8 8 -20 0 > 10.20.0.78@o2ib1 1 NA -1 8 8 8 8 -17 0 > 10.20.1.55@o2ib1 1 NA -1 8 8 8 8 -7 0 > 10.20.0.167@o2ib1 1 NA -1 8 8 8 8 -12 0 > 10.20.1.144@o2ib1 1 NA -1 8 8 8 8 -8 0 > 10.20.1.89@o2ib1 1 NA -1 8 8 8 8 -21 0 > 10.20.1.34@o2ib1 1 NA -1 8 8 8 8 -11 0 > 10.20.0.146@o2ib1 1 NA -1 8 8 8 8 -21 0 > 10.20.0.2@o2ib1 1 NA -1 8 8 8 8 -584 0 > 10.20.1.123@o2ib1 1 NA -1 8 8 8 8 -16 0 > 10.20.0.91@o2ib1 1 NA -1 8 8 8 8 -25 0 > 10.20.1.68@o2ib1 1 NA -1 8 8 8 8 1 0 > 10.20.0.180@o2ib1 1 NA -1 8 8 8 8 -22 0 > 10.20.0.185@o2ib1 1 NA -1 8 8 8 8 -17 0 > 10.20.0.41@o2ib1 1 NA -1 8 8 8 8 -25 0 > 10.20.1.162@o2ib1 1 NA -1 8 8 8 8 -14 0 > 10.20.1.18@o2ib1 1 NA -1 8 8 8 8 -919 0 > 10.20.0.130@o2ib1 1 NA -1 8 8 8 8 -13 0 > 10.20.1.107@o2ib1 1 NA -1 8 8 8 8 -7 0 > 10.20.0.219@o2ib1 1 NA -1 8 8 8 8 -12 0 > 10.20.0.75@o2ib1 1 NA -1 8 8 8 8 -21 0 > 10.20.1.196@o2ib1 4 up -1 8 8 8 8 -419 0 > > > (The last line, the only peer that is "up", is an LNET-router) > > > Something to worry about? That is normal. up/down state information is only given for peers that are routers. "NA" means "Not Applicable". That was an improvement over the past when, if I remember correctly, all non-router peers were listed as "down" regardless of their actual state. Chris _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
