There is QoS in lustre, the feature called NRS - Network Request Scheduler. It is possible to set different policies. Will it address the issue ?
The manual has entry and there were few presentations on LUG/LAD. I did not use NRS myself but I would like to learn. Alex. > On Jul 7, 2023, at 06:48, Anna Fuchs via lustre-discuss > <[email protected]> wrote: > > Dear all, > > I have some questions regarding the following scenario: > - A large HPC system. > - Let's assume that Job X is running on 1 compute node and is reading a very > large file with a stripecount (>>1)..-1. Alternatively, tons of files are > read at once with smaller striping each, but distributed across all OSS/OSTs. > - The compute node is connected, for example, with a 100Gb/s link, and there > are 50 servers, each with a 200Gb/s link. This generates a network load of > 50x200Gb/s, which is processed at 100Gb/s. > - Job Y, which requires the same network and potentially doesn't even perform > I/O, suffers a lot as a result. > > Does this scenario sound familiar to you? > Is the sequence of events correct? > What could be done in this situation? > > To avoid: > a) having such single/few-nodes jobs > b) striping large files with up to -1 > c) reading millions of files at once > One could try, but I have concerns that the users will persist in doing it, > either intentionally or accidentally, and it would only shift the problem, > rather than solving it. > One could tweak the network design, reconfigure it, separate I/O from > communication, but it would hardly optimize all use cases. Virtual lanes > could potentially be a solution as well. Though, that might not help if the > Job Y also involves some I/O. > > Wouldn't it be better if Lustre somehow recognized this imbalance between > incoming and outgoing network traffic and loaded the file(s)/data gradually > rather than all at once, saturating or slightly overloading the consumer > 100Gb/s connection rather than by a factor of 100? Does this sound > reasonable, and is there already a solution for it? > I would appreciate any opinions. > > Best regards > Anna > > -- > Anna Fuchs > Universität Hamburg > https://wr.informatik.uni-hamburg.de/people/anna_fuchs > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
