I'm afraid that would not be very useful. It indeed depends on the refresh rate, but also on how close to vblank the compositor does its commits / on what the latency requirements for the currently shown content are. When the compositor presents a fullscreen video with frames that are queued up in advance, needing a full frame to program the atomic commit could be acceptable, but when the user moves the cursor or plays a game, the compositor needs to do the commits as close to vblank as possible. Without a known upper bound on the time that it takes to program the hardware that's not doable.
Am Fr., 27. Okt. 2023 um 14:01 Uhr schrieb Pekka Paalanen < ppaala...@gmail.com>: > On Fri, 27 Oct 2023 12:01:32 +0200 > Sebastian Wick <sebastian.w...@redhat.com> wrote: > > > On Fri, Oct 27, 2023 at 10:59:25AM +0200, Michel Dänzer wrote: > > > On 10/26/23 21:25, Alex Goins wrote: > > > > On Thu, 26 Oct 2023, Sebastian Wick wrote: > > > >> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote: > > > >>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT) > > > >>> Alex Goins <ago...@nvidia.com> wrote: > > > >>> > > > >>>> Despite being programmable, the LUTs are updated in a manner that > is less > > > >>>> efficient as compared to e.g. the non-static "degamma" LUT. Would > it be helpful > > > >>>> if there was some way to tag operations according to their > performance, > > > >>>> for example so that clients can prefer a high performance one > when they > > > >>>> intend to do an animated transition? I recall from the XDC HDR > workshop > > > >>>> that this is also an issue with AMD's 3DLUT, where updates can be > too > > > >>>> slow to animate. > > > >>> > > > >>> I can certainly see such information being useful, but then we > need to > > > >>> somehow quantize the performance. > > > > > > > > Right, which wouldn't even necessarily be universal, could depend on > the given > > > > host, GPU, etc. It could just be a relative performance indication, > to give an > > > > order of preference. That wouldn't tell you if it can or can't be > animated, but > > > > when choosing between two LUTs to animate you could prefer the higher > > > > performance one. > > > > > > > >>> > > > >>> What I was left puzzled about after the XDC workshop is that is it > > > >>> possible to pre-load configurations in the background (slow), and > then > > > >>> quickly switch between them? Hardware-wise I mean. > > > > > > > > This works fine for our "fast" LUTs, you just point them to a > surface in video > > > > memory and they flip to it. You could keep multiple surfaces around > and flip > > > > between them without having to reprogram them in software. We can > easily do that > > > > with enumerated curves, populating them when the driver initializes > instead of > > > > waiting for the client to request them. You can even point multiple > hardware > > > > LUTs to the same video memory surface, if they need the same curve. > > > > > > > >> > > > >> We could define that pipelines with a lower ID are to be preferred > over > > > >> higher IDs. > > > > > > > > Sure, but this isn't just an issue with a pipeline as a whole, but > the > > > > individual elements within it and how to use them in a given context. > > > > > > > >> > > > >> The issue is that if programming a pipeline becomes too slow to be > > > >> useful it probably should just not be made available to user > space. > > > > > > > > It's not that programming the pipeline is overall too slow. The LUTs > we have > > > > that are relatively slow to program are meant to be set > infrequently, or even > > > > just once, to allow the scaler and tone mapping operator to operate > in fixed > > > > point PQ space. You might still want the tone mapper, so you would > choose a > > > > pipeline that includes them, but when it comes to e.g. animating a > night light, > > > > you would want to choose a different LUT for that purpose. > > > > > > > >> > > > >> The prepare-commit idea for blob properties would help to make the > > > >> pipelines usable again, but until then it's probably a good idea to > just > > > >> not expose those pipelines. > > > > > > > > The prepare-commit idea actually wouldn't work for these LUTs, > because they are > > > > programmed using methods instead of pointing them to a surface. I'm > actually not > > > > sure how slow it actually is, would need to benchmark it. I think > not exposing > > > > them at all would be overkill, since it would mean you can't use the > preblending > > > > scaler or tonemapper, and animation isn't necessary for that. > > > > > > > > The AMD 3DLUT is another example of a LUT that is slow to update, > and it would > > > > obviously be a major loss if that wasn't exposed. There just needs > to be some > > > > way for clients to know if they are going to kill performance by > trying to > > > > change it every frame. > > > > > > Might a first step be to require the ALLOW_MODESET flag to be set when > changing the values for a colorop which is too slow to be updated per > refresh cycle? > > > > > > This would tell the compositor: You can use this colorop, but you > can't change its values on the fly. > > > > I argued before that changing any color op to passthrough should never > > require ALLOW_MODESET and while this is really hard to guarantee from a > > driver perspective I still believe that it's better to not expose any > > feature requiring ALLOW_MODESET or taking too long to program to be > > useful for per-frame changes. > > > > When user space has ways to figure out if going back to a specific state > > (in this case setting everything to bypass) without ALLOW_MODESET we can > > revisit this decision, but until then, let's keep things simple and only > > expose things that work reliably without ALLOW_MODESET and fast enough > > to work for per-frame changes. > > > > Harry, Pekka: Should we document this? It obviously restricts what can > > be exposed but exposing things that can't be used by user space isn't > > useful. > > In an ideal world... but in real world, I don't know. > > Would it help if there was a list collected, with all the things in > various hardware that is known to be too heavy to reprogram every > refresh? Maybe that would allow a more educated decision? > > I bet that depends also on the refresh rate. > > I would probably be fine with some sort of update cost classification > on colorops, and the kernel keeping track of blobs: if userspace sets > the same blob on the same colorop that is already there (by blob ID, no > need to compare contents), then it's a no-op change. > > > Anyway, I really like reading Alex Goins' reply, it seems we are very > much on the same page here. :-) > > > Thanks, > pq >