Hello, Christian. On Fri, May 23, 2025 at 09:58:58AM +0200, Christian König wrote: ... > > - There's a GPU workload which uses a sizable amount of system memory for > > the pool being discussed in this thread. This GPU workload is very > > important, so we want to make sure that other activities in the system > > don't bother it. We give it plenty of isolated CPUs and protect its memory > > with high enough memory.low. > > That situation simply doesn't happen. See isolation is *not* a requirement > for the pool. ... > See the submission model of GPUs is best effort. E.g. you don't guarantee > any performance isolation between processes whatsoever. If we would start > to do this we would need to start re-designing the HW.
This is a radical claim. Let's table the rest of the discussion for now. I don't know enough to tell whether this claim is true or not, but for this to be true, the following should be true: Whether the GPU memory pool is reclaimed or not doesn't have noticeable performance implications on the GPU performance. Is this true? As for the scenario that I described above, I didn't just come up with it. I'm only supporting from system side but that's based on what our ML folks are doing right now. We have a bunch of lage machines with multiple GPUs running ML workloads. The workloads can run for a long time spread across many machines and they synchronize frequently, so any performance drop on one GPU lowers utiliization on all involved GPUs which can go up to three digits. For example, any scheduling disturbances on the submitting thread propagates through the whole cluster and slows down all involved GPUs. Also, because these machines are large on the CPU and memory sides too and aren't doing whole lot other than managing the GPUs, people want to put on a significant amount of CPU work on them which can easily create at least moderate memory pressure. Is the claim that the combined write memory pool doesn't have any meaningful impact on the GPU workload performance? Thanks. -- tejun