On Thu, 10 Apr 2025 14:01:03 -0400 Alyssa Rosenzweig <aly...@rosenzweig.io> wrote:
> > > > In Panfrost and Lima, we don't have this concept of "incremental > > > > rendering", so when we fail the allocation, we just fail the GPU job > > > > with an unhandled GPU fault. > > > > > > To be honest I think that this is enough to mark those two drivers as > > > broken. It's documented that this approach is a no-go for upstream > > > drivers. > > > > > > How widely is that used? > > > > It exists in lima and panfrost, and I wouldn't be surprised if a similar > > mechanism was used in other drivers for tiler-based GPUs (etnaviv, > > freedreno, powervr, ...), because ultimately that's how tilers work: > > the amount of memory needed to store per-tile primitives (and metadata) > > depends on what the geometry pipeline feeds the tiler with, and that > > can't be predicted. If you over-provision, that's memory the system won't > > be able to use while rendering takes place, even though only a small > > portion might actually be used by the GPU. If your allocation is too > > small, it will either trigger a GPU fault (for HW not supporting an > > "incremental rendering" mode) or under-perform (because flushing > > primitives has a huge cost on tilers). > > Yes and no. > > Although we can't allocate more memory for /this/ frame, we know the > required size is probably constant across its lifetime. That gives a > simple heuristic to manage the tiler heap efficiently without > allocations - even fallible ones - in the fence signal path: > > * Start with a small fixed size tiler heap > * Try to render, let incremental rendering kick in when it's too small. > * When cleaning up the job, check if we used incremental rendering. > * If we did - double the size of the heap the next time we submit work. > > The tiler heap still grows dynamically - it just does so over the span > of a couple frames. In practice that means a tiny hit to startup time as > we dynamically figure out the right size, incurring extra flushing at > the start, without needing any "grow-on-page-fault" heroics. > > This should solve the problem completely for CSF/panthor. So it's only > hardware that architecturally cannot do incremental rendering (older > Mali: panfrost/lima) where we need this mess. OTOH, if we need something for Utgard(Lima)/Midgard/Bifrost/Valhall(Panfrost), why not use the same thing for CSF, since CSF is arguably the sanest of all the HW architectures listed above: allocation can fail/be non-blocking, because there's a fallback to incremental rendering when it fails.