On Apr 28, 2024, at 16:54, Anna Fuchs via lustre-discuss 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>> wrote:

The setting max_rpcs_in_flight affects, among other things, how many threads 
can be spawned simultaneously for processing the RPCs, right?

The {osc,mdc}.*.max_rpcs_in_flight are actually controlling the maximum number 
of RPCs a *client* will have in flight to any MDT or OST, while the number of 
MDS and OSS threads is controlled on the server with 
mds.MDS.mdt*.threads_{min,max} and ost.OSS.ost*.threads_{min,max} for each of 
the various service portals (which are selected by the client based on the RPC 
type).  The max_rpcs_in_flight allows concurrent operations on the client for 
multiple threads to hide network latency and to improve server utilization, 
without allowing a single client to overwhelm the server.

In tests where the network is clearly a bottleneck, this setting has almost no 
effect - the network cannot keep up with processing the data, there is not so 
much to do in parallel.
With a faster network, the stats show higher CPU utilization on different cores 
(at least on the client).

What is the exact mechanism by which it is decided that a kernel thread is 
spawned for processing a bulk? Is there an RPC queue with timings or something 
similar?
Is it in any way predictable or calculable how many threads a specific workload 
will require (spawn if possible) given the data rates from the network and 
storage devices?

The mechanism to start new threads is relatively simple.  Before a server 
thread is processing a new request, if it is the last thread available, and not 
the maximum number of threads are running, then it will try to launch a new 
thread; repeat as needed.  So the thread  count will depend on the client RPC 
load and the RPC processing rate and lock contention on whatever resources 
those RPCs are accessing.

With max_rpcs_in_flight = 1, multiple cores are loaded, presumably alternately, 
but the statistics are too inaccurate to capture this.  The distribution of 
threads to cores is regulated by the Linux kernel, right? Does anyone have 
experience with what happens when all CPUs are under full load with the 
application or something else?


Note that {osc,mdc}.*.max_rpcs_in_flight is a *per target* parameter, so a 
single client can still have tens or hundreds of RPCs in flight to different 
servers.  The client will send many RPC types directly from the process 
context, since they are waiting on the result anyway.  For asynchronous bulk 
RPCs, the ptlrpcd thread will try to process the bulk IO on the same CPT (= 
Lustre CPU Partition Table, roughly aligned to NUMA nodes) as the userspace 
application was running when the request was created.  This minimizes the 
cross-NUMA traffic when accessing pages for bulk RPCs, so long as those cores 
are not busy with userspace tasks.  Otherwise, the ptlrpcd thread on another 
CPT will steal RPCs from the queues.

Do the Lustre threads suffer? Is there a prioritization of the Lustre threads 
over other tasks?

Are you asking about the client or the server?  Many of the client RPCs are 
generated by the client threads, but for the running ptlrpcd threads do not 
have a higher priority than client application threads.  If the application 
threads are running on some cores, but other cores are idle, then the ptlrpcd 
threads on other cores will try to process the RPCs to allow the application 
threads to continue running there.  Otherwise, if all cores are busy (as is 
typical for HPC applications) then they will be scheduled by the kernel as 
needed.

Are there readily available statistics or tools for this scenario?

What statistics are you looking for?  There are "{osc,mdc}.*.stats" and 
"{osc,mdc}.*rpc_stats" that have aggregate information about RPC counts and 
latency.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to