On Tue, Dec 8, 2020 at 11:46 AM Kevin M. Hildebrand <[email protected]> wrote:
> We appear to be tripping over the same issues reported recently by > Tung-Han Hsieh and Simon Guilbault, namely that cur_grant_bytes is being > reduced to a very small value and causing abysmal performance. > I'm curious if anyone encountering this problem sees a correlation between cur_grant_bytes and brw_size. For us, the max RPC size is 4MB, and that also seems to be the threshold for cur_grant_bytes below which performance degrades drastically. Would it be reasonable for the grant shrinker to never go below brw_size? As for a "fix", we are monitoring for low cur_grant_bytes and draining work off the nodes, at least to the point where we can do a set param lru_size=clear and then confirm that get param lru_size=0, and then forcing a reconnection and re-negotiation of the grant_bytes with the server with "lctl --device <num> activate". I too would be interested to know if there are downsides to setting grant_shrink=0 on the clients, and whether that is confirmed to actually avoid the problem. Regards, Nathan > > For example, OSTs 0, 1, and 4 are having poor performance on this client > running Lustre 2.12.5: > # lctl get_param osc.*.cur_grant_bytes > osc.lustre10-OST0000-osc-ffff8e47dc02a800.cur_grant_bytes=802542 > osc.lustre10-OST0001-osc-ffff8e47dc02a800.cur_grant_bytes=924204 > osc.lustre10-OST0002-osc-ffff8e47dc02a800.cur_grant_bytes=11076653 > osc.lustre10-OST0003-osc-ffff8e47dc02a800.cur_grant_bytes=108098653 > osc.lustre10-OST0004-osc-ffff8e47dc02a800.cur_grant_bytes=797559 > osc.lustre10-OST0005-osc-ffff8e47dc02a800.cur_grant_bytes=4719258 > osc.lustre10-OST0006-osc-ffff8e47dc02a800.cur_grant_bytes=4898757 > osc.lustre10-OST0007-osc-ffff8e47dc02a800.cur_grant_bytes=10747719 > osc.lustre10-OST0008-osc-ffff8e47dc02a800.cur_grant_bytes=315019599 > osc.lustre10-OST0009-osc-ffff8e47dc02a800.cur_grant_bytes=597198336 > osc.lustre10-OST000a-osc-ffff8e47dc02a800.cur_grant_bytes=278803109 > osc.lustre10-OST000b-osc-ffff8e47dc02a800.cur_grant_bytes=1335800831 > osc.lustre10-OST000c-osc-ffff8e47dc02a800.cur_grant_bytes=795705344 > osc.lustre10-OST000d-osc-ffff8e47dc02a800.cur_grant_bytes=1335052800 > osc.lustre10-OST000e-osc-ffff8e47dc02a800.cur_grant_bytes=474925228 > osc.lustre10-OST000f-osc-ffff8e47dc02a800.cur_grant_bytes=1424795647 > > From the previous discussion, the recommendation seems to have been to run > lctl set_param -P osc.*.grant_shrink=0 on the client. Are there any > downsides to doing this? > Should I just blindly do this on all of my Lustre clients? > > A little more insight into what's going on here would be appreciated. > > Thanks, > Kevin > > -- > Kevin Hildebrand > University of Maryland > Division of IT > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
