Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On Fri, 26 Jan 2024 at 00:22, Faith Ekstrand wrote: > On Thu, Jan 25, 2024 at 5:06 PM Gert Wollny wrote: >> I think with Venus we are more interested in using utility libraries on >> an as-needed basis. Here, most of the time the Vulkan commands are just >> serialized according to the Venus protocol and this is then passed to >> the host because usually it wouldn't make sense to let the guest >> translate the Vulkan commands to something different (e.g. something >> that is commonly used in a runtime), only to then re-encode this in the >> Venus driver to satisfy the host Vulkan driver - just think Spir-V: >> why would we want to have NIR only to then re-encode it to Spir-V? > > I think Venus is an entirely different class of driver. It's not even really > a driver. It's more of a Vulkan layer that has a VM boundary in the middle. > It's attempting to be as thin of a Vulkan -> Vulkan pass-through as possible. > As such, it doesn't use most of the shared stuff anyway. It uses the dispatch > framework and that's really about it. As long as that code stays in-tree > roughly as-is, I think Venus will be fine. The eternal response: you forgot WSI! Cheers, Daniel
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On Thu, Jan 25, 2024 at 5:06 PM Gert Wollny wrote: > Hi, > > thanks, Faith, for bringing this discussion up. > > I think with Venus we are more interested in using utility libraries on > an as-needed basis. Here, most of the time the Vulkan commands are just > serialized according to the Venus protocol and this is then passed to > the host because usually it wouldn't make sense to let the guest > translate the Vulkan commands to something different (e.g. something > that is commonly used in a runtime), only to then re-encode this in the > Venus driver to satisfy the host Vulkan driver - just think Spir-V: > why would we want to have NIR only to then re-encode it to Spir-V? > I think Venus is an entirely different class of driver. It's not even really a driver. It's more of a Vulkan layer that has a VM boundary in the middle. It's attempting to be as thin of a Vulkan -> Vulkan pass-through as possible. As such, it doesn't use most of the shared stuff anyway. It uses the dispatch framework and that's really about it. As long as that code stays in-tree roughly as-is, I think Venus will be fine. > I'd also like to give a +1 to the points raised by Triang3l and others > about the potential of breaking other drivers. I've certainly be bitten > by this on the Gallium side with r600, and unfortunately I can't set up > a CI in my home office (and after watching the XDC talk about setting > up your own CI I was even more discouraged to do this). > That's a risk with all common code. You could raise the same risk with NIR or basically anything else. Sure, if someone wants to go write all the code themselves in an attempt to avoid bugs, I guess they're free to do that. I don't really see that as a compelling argument, though. Also, while you experienced gallium breakage with r600, having worked on i965, I can guarantee you that that's still better than maintaining a classic (non-gallium) GL driver. At the moment, given the responses I've seen and the scope of the project as things are starting to congeal in my head, I don't think this will be an incremental thing where drivers get converted as we go anymore. If we really do want to flip the flow, I think it'll be invasive enough that we'll build gallium2 and then people can port to it if they want. I may port a driver or two myself but those will be things I own or am at least willing to deal with the bug fallout for. Others can port or not at-will. This is what I meant when I said elsewhere that we're probably heading towards a gallium/classic situation again. I don't expect anyone to port until the benefits outweigh the costs but I do expect the benefits will be there eventually. ~Faith
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
Hi, thanks, Faith, for bringing this discussion up. I think with Venus we are more interested in using utility libraries on an as-needed basis. Here, most of the time the Vulkan commands are just serialized according to the Venus protocol and this is then passed to the host because usually it wouldn't make sense to let the guest translate the Vulkan commands to something different (e.g. something that is commonly used in a runtime), only to then re-encode this in the Venus driver to satisfy the host Vulkan driver - just think Spir-V: why would we want to have NIR only to then re-encode it to Spir-V? I'd also like to give a +1 to the points raised by Triang3l and others about the potential of breaking other drivers. I've certainly be bitten by this on the Gallium side with r600, and unfortunately I can't set up a CI in my home office (and after watching the XDC talk about setting up your own CI I was even more discouraged to do this). In summary I certainly see the advantage in using common code, but with these two points above in mind I think opt-in is better. Gert
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On 24/01/2024 18:26, Faith Ekstrand wrote: > So far, we've been trying to build those components in terms of the > Vulkan API itself with calls jumping back into the dispatch table to > try and get inside the driver. To me, it looks like the "opt-in" approach would still be well-applicable to the goal of cleaning up "implementing Vulkan in Vulkan", and gradual changes diverging from the usual Vulkan specification behavior can be implemented and maintained in existing and new drivers more efficiently compared to a whole new programming model. I think it's important that the scale of our solution should be appropriate to the scale of the problem, otherwise we risk creating large issues in other areas. Currently there are pretty few places where Mesa implements Vulkan on top of Vulkan: • WSI, • Emulated render passes, • Emulated secondary command buffers, • Meta. For WSI, render passes and secondary command buffers, I don't think there's anything that needs to be done, as those already have little to none driver backend involvement or interference with application's calls — render pass and secondary command buffer emulation interacts with the hardware driver entirely within the framework of the Vulkan specification, only storing a few fields in vk_command_buffer which are already handled fully in common code. Common meta, on the other hand, yes, is extremely intrusive — overriding the application's pipeline state, bindings, and passing shaders directly in NIR bypassing SPIR-V. But with meta being such a different beast, I think we shouldn't even be trying to tame it with the same interfaces as everything else. If we're going to handle meta's special cases throughout our common "Gallium2" framework, it feels like we'll simply be turning our "Vulkan on Vulkan" issue into the problem of "implementing Gallium2 on Gallium2". Instead, I think the cleanest solution in the common meta would be sending commands to the driver through a separate callback interface specifically for meta instead of trying to make meta mimic application code. That would allow drivers to clearly negotiate the details of applying/reverting state changes, shader compilation, while letting their developers assume that everything else is written for the most part purely against the Vulkan specification. It would still be okay for meta to make calls to vkGetPhysicalDevice*, vkCreate*/vkDestroy*, as long as they're done within the rules of the Vulkan specification, to require certain extensions, as well as to do some less-intrusive, non-hot-path interaction with the driver's internals directly — such as requiring that every VkImage is a vk_image and pulling the needed create info fields from there. However, everything interacting with the state/bindings, as well as things going beyond the specification like creating image views with incompatible formats, would be going through those new callbacks. NVK-style drivers would be able to share a common implementation of those callbacks. Drivers that want to take advantage of more direct-to-hardware paths would need to provide what's friendly to them (maybe even with lighter handling of compute-based meta operations compared to graphics ones). That'd probably be not a single flat list of callbacks, but a bunch of ones — like it'd be possible for a driver to use the common command buffer callbacks, but to specialize some view/descriptor-related ones (it may not be possible to make those common at all, by the way). And if a driver doesn't need the common meta at all, none of that would be bothering it. The other advantages I see in this separate meta API approach are: • In the rest of the code, driver developers in most cases will need to refer to only a single authority — the massively detailed Vulkan specification, and there are risks regarding rolling our own interface for everything: • Driver developers will have to spend more time carefully looking up what they need to do in two places rather than largely just one. • We're much more prone to leaving gaps in our interface and to writing lacking documentation. I can't see this effort not being rushed, with us having to catch up to 10 years of XGL/Vulkan development, while moving many drivers alongside working on other tasks, and with varying levels of enthusiasm of driver developers towards this. Unless zmike's 10 years estimate is our actual target 路 • Having to deal with a new large-scale API may raise the barrier for new contributors and discourage them. Unlike with OpenGL with all the resource renaming stuff, except for shader compilation, the experience I got from developing applications on Vulkan was enough for me to start comfortably implementing it. When zmike showed me an R600g issue about some relation of vertex buffer bindings and CSOs, I just didn't have anything useful to say. • Faster iteration inside the common meta code, with the meta
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On Thu, Jan 25, 2024 at 8:57 AM Jose Fonseca wrote: > > So far, we've been trying to build those components in terms of the > Vulkan API itself with calls jumping back into the dispatch table to try > and get inside the driver. This is working but it's getting more and more > fragile the more tools we add to that box. A lot of what I want to do with > gallium2 or whatever we're calling it is to fix our layering problems so > that calls go in one direction and we can untangle the jumble. I'm still > not sure what I want that to look like but I think I want it to look a lot > like Vulkan, just with a handier interface. > > That resonates with my experience. For example, Galllium draw module does > some of this too -- it provides its own internal interfaces for drivers, > but it also loops back into Gallium top interface to set FS and rasterizer > state -- and that has *always* been a source of grief. Having control > flow proceeding through layers in one direction only seems an important > principle to observe. It's fine if the lower interface is the same > interface (e.g., Gallium to Gallium, or Vulkan to Vulkan as you allude), > but they shouldn't be the same exact entry-points/modules (ie, no > reentrancy/recursion.) > > It's also worth considering that Vulkan extensibility could come in hand > too in what you want to achieve. For example, Mesa Vulkan drivers could > have their own VK_MESA_internal_ extensions that could be used by the > shared Vulkan code to do lower level things. > We already do that for a handful of things. The fact that Vulkan doesn't ever check the stuff in the pNext chain is really useful for that. ~Faith > Jose > > > On Wed, Jan 24, 2024 at 3:26 PM Faith Ekstrand > wrote: > >> Jose, >> >> Thanks for your thoughts! >> >> On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca >> wrote: >> > >> > I don't know much about the current Vulkan driver internals to have or >> provide an informed opinion on the path forward, but I'd like to share my >> backwards looking perspective. >> > >> > Looking back, Gallium was two things effectively: >> > (1) an abstraction layer, that's watertight (as in upper layers >> shouldn't reach through to lower layers) >> > (2) an ecosystem of reusable components (draw, util, tgsi, etc.) >> > >> > (1) was of course important -- and the discipline it imposed is what >> enabled to great simplifications -- but it also became a straight-jacket, >> as GPUs didn't stand still, and sooner or later the >> see-every-hardware-as-the-same lenses stop reflecting reality. >> > >> > If I had to pick one, I'd say that (2) is far more useful and >> practical.Take components like gallium's draw and other util modules. A >> driver can choose to use them or not. One could fork them within Mesa >> source tree, and only the drivers that opt-in into the fork would need to >> be tested/adapted/etc >> > >> > On the flip side, Vulkan API is already a pretty low level HW >> abstraction. It's also very flexible and extensible, so it's hard to >> provide a watertight abstraction underneath it without either taking the >> lowest common denominator, or having lots of optional bits of functionality >> governed by a myriad of caps like you alluded to. >> >> There is a third thing that isn't really recognized in your description: >> >> (3) A common "language" to talk about GPUs and data structures that >> represent that language >> >> This is precisely what the Vulkan runtime today doesn't have. Classic >> meta sucked because we were trying to implement GL in GL. u_blitter, >> on the other hand, is pretty fantastic because Gallium provides a much >> more sane interface to write those common components in terms of. >> >> So far, we've been trying to build those components in terms of the >> Vulkan API itself with calls jumping back into the dispatch table to >> try and get inside the driver. This is working but it's getting more >> and more fragile the more tools we add to that box. A lot of what I >> want to do with gallium2 or whatever we're calling it is to fix our >> layering problems so that calls go in one direction and we can >> untangle the jumble. I'm still not sure what I want that to look like >> but I think I want it to look a lot like Vulkan, just with a handier >> interface. >> >> ~Faith >> >> > Not sure how useful this is in practice to you, but the lesson from my >> POV is that opt-in reusable and shared libraries are always time well spent >> as they can bend and adapt with the times, whereas no opt-out watertight >> abstractions inherently have a shelf life. >> > >> > Jose >> > >> > On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand >> wrote: >> >> >> >> Yeah, this one's gonna hit Phoronix... >> >> >> >> When we started writing Vulkan drivers back in the day, there was this >> >> notion that Vulkan was a low-level API that directly targets hardware. >> >> Vulkan drivers were these super thin things that just blasted packets >> >> straight into the hardware. What
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
> So far, we've been trying to build those components in terms of the Vulkan API itself with calls jumping back into the dispatch table to try and get inside the driver. This is working but it's getting more and more fragile the more tools we add to that box. A lot of what I want to do with gallium2 or whatever we're calling it is to fix our layering problems so that calls go in one direction and we can untangle the jumble. I'm still not sure what I want that to look like but I think I want it to look a lot like Vulkan, just with a handier interface. That resonates with my experience. For example, Galllium draw module does some of this too -- it provides its own internal interfaces for drivers, but it also loops back into Gallium top interface to set FS and rasterizer state -- and that has *always* been a source of grief. Having control flow proceeding through layers in one direction only seems an important principle to observe. It's fine if the lower interface is the same interface (e.g., Gallium to Gallium, or Vulkan to Vulkan as you allude), but they shouldn't be the same exact entry-points/modules (ie, no reentrancy/recursion.) It's also worth considering that Vulkan extensibility could come in hand too in what you want to achieve. For example, Mesa Vulkan drivers could have their own VK_MESA_internal_ extensions that could be used by the shared Vulkan code to do lower level things. Jose On Wed, Jan 24, 2024 at 3:26 PM Faith Ekstrand wrote: > Jose, > > Thanks for your thoughts! > > On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca > wrote: > > > > I don't know much about the current Vulkan driver internals to have or > provide an informed opinion on the path forward, but I'd like to share my > backwards looking perspective. > > > > Looking back, Gallium was two things effectively: > > (1) an abstraction layer, that's watertight (as in upper layers > shouldn't reach through to lower layers) > > (2) an ecosystem of reusable components (draw, util, tgsi, etc.) > > > > (1) was of course important -- and the discipline it imposed is what > enabled to great simplifications -- but it also became a straight-jacket, > as GPUs didn't stand still, and sooner or later the > see-every-hardware-as-the-same lenses stop reflecting reality. > > > > If I had to pick one, I'd say that (2) is far more useful and > practical.Take components like gallium's draw and other util modules. A > driver can choose to use them or not. One could fork them within Mesa > source tree, and only the drivers that opt-in into the fork would need to > be tested/adapted/etc > > > > On the flip side, Vulkan API is already a pretty low level HW > abstraction. It's also very flexible and extensible, so it's hard to > provide a watertight abstraction underneath it without either taking the > lowest common denominator, or having lots of optional bits of functionality > governed by a myriad of caps like you alluded to. > > There is a third thing that isn't really recognized in your description: > > (3) A common "language" to talk about GPUs and data structures that > represent that language > > This is precisely what the Vulkan runtime today doesn't have. Classic > meta sucked because we were trying to implement GL in GL. u_blitter, > on the other hand, is pretty fantastic because Gallium provides a much > more sane interface to write those common components in terms of. > > So far, we've been trying to build those components in terms of the > Vulkan API itself with calls jumping back into the dispatch table to > try and get inside the driver. This is working but it's getting more > and more fragile the more tools we add to that box. A lot of what I > want to do with gallium2 or whatever we're calling it is to fix our > layering problems so that calls go in one direction and we can > untangle the jumble. I'm still not sure what I want that to look like > but I think I want it to look a lot like Vulkan, just with a handier > interface. > > ~Faith > > > Not sure how useful this is in practice to you, but the lesson from my > POV is that opt-in reusable and shared libraries are always time well spent > as they can bend and adapt with the times, whereas no opt-out watertight > abstractions inherently have a shelf life. > > > > Jose > > > > On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand > wrote: > >> > >> Yeah, this one's gonna hit Phoronix... > >> > >> When we started writing Vulkan drivers back in the day, there was this > >> notion that Vulkan was a low-level API that directly targets hardware. > >> Vulkan drivers were these super thin things that just blasted packets > >> straight into the hardware. What little code was common was small and > >> pretty easy to just copy+paste around. It was a nice thought... > >> > >> What's happened in the intervening 8 years is that Vulkan has grown. A > lot. > >> > >> We already have several places where we're doing significant layering. > >> It started with sharing the WSI code and some Python
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On Wed, Jan 24, 2024 at 6:57 PM Marek Olšák wrote: > > Gallium looks like it was just a copy of DX10, and likely many things were > known from DX10 in advance before anything started. Vulkanium doesn't have > anything to draw inspiration from. It's a completely unexplored idea. I'm not sure if I follow this. GNU/Linux didn't have a unified driver interface to implement GL, but Windows did have a standardized interface to implement D3D10 which we drew inspiration from. The same is still true if you s/GL/Vulkan/ and s/D3D10/D3D12/. It's just that more features of modern API's are tied to kernel features (i.e. wddm versions) than in the past, but with gpuvm, drm scheduler and syncobj that's also going to be Vulkan's path. Now, you might say that this time we're not going to use any lessons from Windows and this interface will be completely unlike what Windows does for D3D12, which is fine but I still wouldn't call the idea of standardizing an interface for a low level graphics API a completely unexplored idea given that it works on Windows on an api that's a lot more like Vulkan, than D3D10 was like GL. z
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
Gallium looks like it was just a copy of DX10, and likely many things were known from DX10 in advance before anything started. Vulkanium doesn't have anything to draw inspiration from. It's a completely unexplored idea. AMD's PAL is the same idea as Gallium. It's used to implement Vulkan, DX, Mantle, Metal, etc. Marek On Wed, Jan 24, 2024, 13:40 Faith Ekstrand wrote: > On Wed, Jan 24, 2024 at 12:26 PM Zack Rusin > wrote: > > > > On Wed, Jan 24, 2024 at 10:27 AM Faith Ekstrand > wrote: > > > > > > Jose, > > > > > > Thanks for your thoughts! > > > > > > On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca < > jose.fons...@broadcom.com> wrote: > > > > > > > > I don't know much about the current Vulkan driver internals to have > or provide an informed opinion on the path forward, but I'd like to share > my backwards looking perspective. > > > > > > > > Looking back, Gallium was two things effectively: > > > > (1) an abstraction layer, that's watertight (as in upper layers > shouldn't reach through to lower layers) > > > > (2) an ecosystem of reusable components (draw, util, tgsi, etc.) > > > > > > > > (1) was of course important -- and the discipline it imposed is what > enabled to great simplifications -- but it also became a straight-jacket, > as GPUs didn't stand still, and sooner or later the > see-every-hardware-as-the-same lenses stop reflecting reality. > > > > > > > > If I had to pick one, I'd say that (2) is far more useful and > practical.Take components like gallium's draw and other util modules. A > driver can choose to use them or not. One could fork them within Mesa > source tree, and only the drivers that opt-in into the fork would need to > be tested/adapted/etc > > > > > > > > On the flip side, Vulkan API is already a pretty low level HW > abstraction. It's also very flexible and extensible, so it's hard to > provide a watertight abstraction underneath it without either taking the > lowest common denominator, or having lots of optional bits of functionality > governed by a myriad of caps like you alluded to. > > > > > > There is a third thing that isn't really recognized in your > description: > > > > > > (3) A common "language" to talk about GPUs and data structures that > > > represent that language > > > > > > This is precisely what the Vulkan runtime today doesn't have. Classic > > > meta sucked because we were trying to implement GL in GL. u_blitter, > > > on the other hand, is pretty fantastic because Gallium provides a much > > > more sane interface to write those common components in terms of. > > > > > > So far, we've been trying to build those components in terms of the > > > Vulkan API itself with calls jumping back into the dispatch table to > > > try and get inside the driver. This is working but it's getting more > > > and more fragile the more tools we add to that box. A lot of what I > > > want to do with gallium2 or whatever we're calling it is to fix our > > > layering problems so that calls go in one direction and we can > > > untangle the jumble. I'm still not sure what I want that to look like > > > but I think I want it to look a lot like Vulkan, just with a handier > > > interface. > > > > Yes, that makes sense. When we were writing the initial components for > > gallium (draw and cso) I really liked the general concept and thought > > about trying to reuse them in the old, non-gallium Mesa drivers but > > the obstacle was that there was no common interface to lay them on. > > Using GL to implement GL was silly and using Vulkan to implement > > Vulkan is not much better. > > > > Having said that my general thoughts on GPU abstractions largely match > > what Jose has said. To me it's a question of whether a clean > > abstraction: > > - on top of which you can build an entire GPU driver toolkit (i.e. all > > the components and helpers) > > - that makes it trivial to figure up what needs to be done to write a > > new driver and makes bootstrapping a new driver a lot simpler > > - that makes it easier to reason about cross hardware concepts (it's a > > lot easier to understand the entirety of the ecosystem if every driver > > is not doing something unique to implement similar functionality) > > is worth more than almost exponentially increasing the difficulty of: > > - advancing the ecosystem (i.e. it might be easier to understand but > > it's way harder to create clean abstractions across such different > > hardware). > > - driver maintenance (i.e. there will be a constant stream of > > regressions hitting your driver as a result of other people working on > > their drivers) > > - general development (i.e. bug fixes/new features being held back > > because they break some other driver) > > > > Some of those can certainly be titled one way or the other, e.g. the > > driver maintenance con be somewhat eased by requiring that every > > driver working on top of the new abstraction has to have a stable > > Mesa-CI setup (be it lava or ci-tron, or whatever) but all of those
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
I'll agree with Jose about Vulkan being a low-level abstraction, and to me the "opt-in" way seems like a much more balanced approach to achieving our goals — not only balanced between the goals themselves (code amount and time to implement aren't our only criteria to optimize), but also across the variety of hardware — as if something goes wrong with the watertight abstraction for a certain implementation, not only it'd take more time to find a solution, but issues of one driver risk wasting time of everyone as it'd often be necessary to make debatable changes to interfaces used by all drivers. I also need to further clarify my point regardless the design of what we want to encourage drivers to use, specifically about pipeline objects and dynamic state/ESO. Vulkan, as I see from all the perspectives I'm regularly interacting with it from — as an RHI programmer at a game studio, a translation layer developer (the Xenia Xbox 360 emulator), and now creating a driver for it — has not grown much thicker than it originally was. What has increased is its surface area — but where it's actually important: letting applications more precisely convey their intentions. I'd say it's even thinner and more transparent from this point of view now. We got nice things like inline uniform blocks, host image copy, push descriptors, descriptor buffers, and of course dynamic state — and they all pretty much directly correspond to some hardware concepts, that apps can utilize to do what they want with less indirection between their actual architecture and the hardware. Essentially, the application and the driver (and the rest of the chain — the specification, I'd like to retract my statement about "fighting" it, by the way, and the hardware controlled by that driver) can work more cooperatively now, towards their common goal of delivering what the app developer wants to provide to the user with as high quality and speed as realistically possible. They now have more ways of helping each other by communicating their intentions and capabilities to each other more completely and accurately. And it's important for us not to go *backwards*. This is why I think it's just fundamentally wrong to encourage drivers to layer pipeline objects and static state on top of dynamic state. An application would typically use static state when it: • Knows the potentially needed state setups in advance (like in a game with demands of materials preprocessed, or in a non-gaming/non-DCC app). • Wants to quickly apply a complete state configuration. • Maybe doesn't care much about the state used by previously done work, like drawing wildly different kinds of objects in a scene. At the same time, it'd choose dynamic if it: • Doesn't have upfront knowledge of possible states (like in an OpenGL/ D3D9/D3D11 translation layer or a console emulator, or with a highly flexible art pipeline in the game). • Wants to quickly make small, incremental state changes. • Maybe wills to mix state variables updated at different frequencies. Their use cases, and application's intentions they convey, are as opposite as the antonymous words "static" and "dynamic" they're called. Treating one like a specialization of the other is making the driver blind in the same way as back in 2016 when applications had no other option but to reduce everything to static state. (Of course with state spanning so many pipeline stages, applications would usually not just be picking one of the two extremes, and instead may want static for some cases/stages and dynamic for the other. This is also where the route Vulkan's development over the 8 years has taken is very wise: instead of forcing Escobar's axiom of choice upon applications, let them specify their intentions on a per-variable basis, and choose the appropriate amount of state grouping among monolithic pipelines, GPL with libraries containing one or multiple parts of a pipeline, and ESO.) The primary rule of game optimization is, if you can avoid doing something every frame, or, even worse, hundreds or thousands of times per frame, do whatever reuse you can to avoid that. If we know that's what the game wants to do — by providing a pipeline object with the state it wants to be static, a pipeline layout object — we should be aiding it. Just like if the the game tells us that it can't precompile something, the graphics stack should do the best it can in this situation — it would be wrong to add the overhead of running a time machine to 2016 to its draws either. After all, the driver's draw call code and the game's draw call code are both just draw call code with one common goal. So, it's important that whichever solution we end up with, it must not be a "broken telephone" degrading the cooperation between the application and the driver. And we should not forget that the communication between them is two-way, which includes: • Interface calls done by the app. • Limits and features exposed by the driver.
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On Wed, Jan 24, 2024 at 12:26 PM Zack Rusin wrote: > > On Wed, Jan 24, 2024 at 10:27 AM Faith Ekstrand wrote: > > > > Jose, > > > > Thanks for your thoughts! > > > > On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca wrote: > > > > > > I don't know much about the current Vulkan driver internals to have or provide an informed opinion on the path forward, but I'd like to share my backwards looking perspective. > > > > > > Looking back, Gallium was two things effectively: > > > (1) an abstraction layer, that's watertight (as in upper layers shouldn't reach through to lower layers) > > > (2) an ecosystem of reusable components (draw, util, tgsi, etc.) > > > > > > (1) was of course important -- and the discipline it imposed is what enabled to great simplifications -- but it also became a straight-jacket, as GPUs didn't stand still, and sooner or later the see-every-hardware-as-the-same lenses stop reflecting reality. > > > > > > If I had to pick one, I'd say that (2) is far more useful and practical.Take components like gallium's draw and other util modules. A driver can choose to use them or not. One could fork them within Mesa source tree, and only the drivers that opt-in into the fork would need to be tested/adapted/etc > > > > > > On the flip side, Vulkan API is already a pretty low level HW abstraction. It's also very flexible and extensible, so it's hard to provide a watertight abstraction underneath it without either taking the lowest common denominator, or having lots of optional bits of functionality governed by a myriad of caps like you alluded to. > > > > There is a third thing that isn't really recognized in your description: > > > > (3) A common "language" to talk about GPUs and data structures that > > represent that language > > > > This is precisely what the Vulkan runtime today doesn't have. Classic > > meta sucked because we were trying to implement GL in GL. u_blitter, > > on the other hand, is pretty fantastic because Gallium provides a much > > more sane interface to write those common components in terms of. > > > > So far, we've been trying to build those components in terms of the > > Vulkan API itself with calls jumping back into the dispatch table to > > try and get inside the driver. This is working but it's getting more > > and more fragile the more tools we add to that box. A lot of what I > > want to do with gallium2 or whatever we're calling it is to fix our > > layering problems so that calls go in one direction and we can > > untangle the jumble. I'm still not sure what I want that to look like > > but I think I want it to look a lot like Vulkan, just with a handier > > interface. > > Yes, that makes sense. When we were writing the initial components for > gallium (draw and cso) I really liked the general concept and thought > about trying to reuse them in the old, non-gallium Mesa drivers but > the obstacle was that there was no common interface to lay them on. > Using GL to implement GL was silly and using Vulkan to implement > Vulkan is not much better. > > Having said that my general thoughts on GPU abstractions largely match > what Jose has said. To me it's a question of whether a clean > abstraction: > - on top of which you can build an entire GPU driver toolkit (i.e. all > the components and helpers) > - that makes it trivial to figure up what needs to be done to write a > new driver and makes bootstrapping a new driver a lot simpler > - that makes it easier to reason about cross hardware concepts (it's a > lot easier to understand the entirety of the ecosystem if every driver > is not doing something unique to implement similar functionality) > is worth more than almost exponentially increasing the difficulty of: > - advancing the ecosystem (i.e. it might be easier to understand but > it's way harder to create clean abstractions across such different > hardware). > - driver maintenance (i.e. there will be a constant stream of > regressions hitting your driver as a result of other people working on > their drivers) > - general development (i.e. bug fixes/new features being held back > because they break some other driver) > > Some of those can certainly be titled one way or the other, e.g. the > driver maintenance con be somewhat eased by requiring that every > driver working on top of the new abstraction has to have a stable > Mesa-CI setup (be it lava or ci-tron, or whatever) but all of those > things need to be reasoned about. In my experience abstractions never > have uniform support because some people will value cons of them more > than they value the pros. So the entire process requires some very > steadfast individuals to keep going despite hearing that the effort is > dumb, at least until the benefits of the new approach are impossible > to deny. So you know... "how much do you believe in this approach > because some days will suck and you can't give up" ;) is probably the > question. Well, I've built my entire career out of doing things that others said
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On Wed, Jan 24, 2024 at 10:27 AM Faith Ekstrand wrote: > > Jose, > > Thanks for your thoughts! > > On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca > wrote: > > > > I don't know much about the current Vulkan driver internals to have or > > provide an informed opinion on the path forward, but I'd like to share my > > backwards looking perspective. > > > > Looking back, Gallium was two things effectively: > > (1) an abstraction layer, that's watertight (as in upper layers shouldn't > > reach through to lower layers) > > (2) an ecosystem of reusable components (draw, util, tgsi, etc.) > > > > (1) was of course important -- and the discipline it imposed is what > > enabled to great simplifications -- but it also became a straight-jacket, > > as GPUs didn't stand still, and sooner or later the > > see-every-hardware-as-the-same lenses stop reflecting reality. > > > > If I had to pick one, I'd say that (2) is far more useful and practical. > > Take components like gallium's draw and other util modules. A driver can > > choose to use them or not. One could fork them within Mesa source tree, > > and only the drivers that opt-in into the fork would need to be > > tested/adapted/etc > > > > On the flip side, Vulkan API is already a pretty low level HW abstraction. > > It's also very flexible and extensible, so it's hard to provide a > > watertight abstraction underneath it without either taking the lowest > > common denominator, or having lots of optional bits of functionality > > governed by a myriad of caps like you alluded to. > > There is a third thing that isn't really recognized in your description: > > (3) A common "language" to talk about GPUs and data structures that > represent that language > > This is precisely what the Vulkan runtime today doesn't have. Classic > meta sucked because we were trying to implement GL in GL. u_blitter, > on the other hand, is pretty fantastic because Gallium provides a much > more sane interface to write those common components in terms of. > > So far, we've been trying to build those components in terms of the > Vulkan API itself with calls jumping back into the dispatch table to > try and get inside the driver. This is working but it's getting more > and more fragile the more tools we add to that box. A lot of what I > want to do with gallium2 or whatever we're calling it is to fix our > layering problems so that calls go in one direction and we can > untangle the jumble. I'm still not sure what I want that to look like > but I think I want it to look a lot like Vulkan, just with a handier > interface. Yes, that makes sense. When we were writing the initial components for gallium (draw and cso) I really liked the general concept and thought about trying to reuse them in the old, non-gallium Mesa drivers but the obstacle was that there was no common interface to lay them on. Using GL to implement GL was silly and using Vulkan to implement Vulkan is not much better. Having said that my general thoughts on GPU abstractions largely match what Jose has said. To me it's a question of whether a clean abstraction: - on top of which you can build an entire GPU driver toolkit (i.e. all the components and helpers) - that makes it trivial to figure up what needs to be done to write a new driver and makes bootstrapping a new driver a lot simpler - that makes it easier to reason about cross hardware concepts (it's a lot easier to understand the entirety of the ecosystem if every driver is not doing something unique to implement similar functionality) is worth more than almost exponentially increasing the difficulty of: - advancing the ecosystem (i.e. it might be easier to understand but it's way harder to create clean abstractions across such different hardware). - driver maintenance (i.e. there will be a constant stream of regressions hitting your driver as a result of other people working on their drivers) - general development (i.e. bug fixes/new features being held back because they break some other driver) Some of those can certainly be titled one way or the other, e.g. the driver maintenance con be somewhat eased by requiring that every driver working on top of the new abstraction has to have a stable Mesa-CI setup (be it lava or ci-tron, or whatever) but all of those things need to be reasoned about. In my experience abstractions never have uniform support because some people will value cons of them more than they value the pros. So the entire process requires some very steadfast individuals to keep going despite hearing that the effort is dumb, at least until the benefits of the new approach are impossible to deny. So you know... "how much do you believe in this approach because some days will suck and you can't give up" ;) is probably the question. z
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
Jose, Thanks for your thoughts! On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca wrote: > > I don't know much about the current Vulkan driver internals to have or > provide an informed opinion on the path forward, but I'd like to share my > backwards looking perspective. > > Looking back, Gallium was two things effectively: > (1) an abstraction layer, that's watertight (as in upper layers shouldn't > reach through to lower layers) > (2) an ecosystem of reusable components (draw, util, tgsi, etc.) > > (1) was of course important -- and the discipline it imposed is what enabled > to great simplifications -- but it also became a straight-jacket, as GPUs > didn't stand still, and sooner or later the see-every-hardware-as-the-same > lenses stop reflecting reality. > > If I had to pick one, I'd say that (2) is far more useful and practical. > Take components like gallium's draw and other util modules. A driver can > choose to use them or not. One could fork them within Mesa source tree, and > only the drivers that opt-in into the fork would need to be tested/adapted/etc > > On the flip side, Vulkan API is already a pretty low level HW abstraction. > It's also very flexible and extensible, so it's hard to provide a watertight > abstraction underneath it without either taking the lowest common > denominator, or having lots of optional bits of functionality governed by a > myriad of caps like you alluded to. There is a third thing that isn't really recognized in your description: (3) A common "language" to talk about GPUs and data structures that represent that language This is precisely what the Vulkan runtime today doesn't have. Classic meta sucked because we were trying to implement GL in GL. u_blitter, on the other hand, is pretty fantastic because Gallium provides a much more sane interface to write those common components in terms of. So far, we've been trying to build those components in terms of the Vulkan API itself with calls jumping back into the dispatch table to try and get inside the driver. This is working but it's getting more and more fragile the more tools we add to that box. A lot of what I want to do with gallium2 or whatever we're calling it is to fix our layering problems so that calls go in one direction and we can untangle the jumble. I'm still not sure what I want that to look like but I think I want it to look a lot like Vulkan, just with a handier interface. ~Faith > Not sure how useful this is in practice to you, but the lesson from my POV is > that opt-in reusable and shared libraries are always time well spent as they > can bend and adapt with the times, whereas no opt-out watertight abstractions > inherently have a shelf life. > > Jose > > On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand wrote: >> >> Yeah, this one's gonna hit Phoronix... >> >> When we started writing Vulkan drivers back in the day, there was this >> notion that Vulkan was a low-level API that directly targets hardware. >> Vulkan drivers were these super thin things that just blasted packets >> straight into the hardware. What little code was common was small and >> pretty easy to just copy+paste around. It was a nice thought... >> >> What's happened in the intervening 8 years is that Vulkan has grown. A lot. >> >> We already have several places where we're doing significant layering. >> It started with sharing the WSI code and some Python for generating >> dispatch tables. Later we added common synchronization code and a few >> vkFoo2 wrappers. Then render passes and... >> >> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024 >> >> That's been my project the last couple weeks: A common VkPipeline >> implementation built on top of an ESO-like interface. The big >> deviation this MR makes from prior art is that I make no attempt at >> pretending it's a layered implementation. The vtable for shader >> objects looks like ESO but takes its own path when it's useful to do >> so. For instance, shader creation always consumes NIR and a handful of >> lowering passes are run for you. It's no st_glsl_to_nir but it is a >> bit opinionated. Also, a few of the bits that are missing from ESO >> such as robustness have been added to the interface. >> >> In my mind, this marks a pretty fundamental shift in how the Vulkan >> runtime works, at least in my mind. Previously, everything was >> designed to be a toolbox where you can kind of pick and choose what >> you want to use. Also, everything at least tried to act like a layer >> where you still implemented Vulkan but you could leave out bits like >> render passes if you implemented the new thing and were okay with the >> layer. With the ESO code, you implement something that isn't Vulkan >> entrypoints and the actual entrypoints live in the runtime. This lets >> us expand and adjust the interface as needed for our purposes as well >> as sanitize certain things even in the modern API. >> >> The result is that NVK is starting to feel like a
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
I don't know much about the current Vulkan driver internals to have or provide an informed opinion on the path forward, but I'd like to share my backwards looking perspective. Looking back, Gallium was two things effectively: (1) an abstraction layer, that's watertight (as in upper layers shouldn't reach through to lower layers) (2) an ecosystem of reusable components (draw, util, tgsi, etc.) (1) was of course important -- and the discipline it imposed is what enabled to great simplifications -- but it also became a straight-jacket, as GPUs didn't stand still, and sooner or later the see-every-hardware-as-the-same lenses stop reflecting reality. If I had to pick one, I'd say that (2) is far more useful and practical. Take components like gallium's draw and other util modules. A driver can choose to use them or not. One could fork them within Mesa source tree, and only the drivers that opt-in into the fork would need to be tested/adapted/etc On the flip side, Vulkan API is already a pretty low level HW abstraction. It's also very flexible and extensible, so it's hard to provide a watertight abstraction underneath it without either taking the lowest common denominator, or having lots of optional bits of functionality governed by a myriad of caps like you alluded to. Not sure how useful this is in practice to you, but the lesson from my POV is that *opt-in* reusable and shared libraries are always time well spent as they can bend and adapt with the times, whereas *no opt-out* watertight abstractions inherently have a shelf life. Jose On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand wrote: > Yeah, this one's gonna hit Phoronix... > > When we started writing Vulkan drivers back in the day, there was this > notion that Vulkan was a low-level API that directly targets hardware. > Vulkan drivers were these super thin things that just blasted packets > straight into the hardware. What little code was common was small and > pretty easy to just copy+paste around. It was a nice thought... > > What's happened in the intervening 8 years is that Vulkan has grown. A lot. > > We already have several places where we're doing significant layering. > It started with sharing the WSI code and some Python for generating > dispatch tables. Later we added common synchronization code and a few > vkFoo2 wrappers. Then render passes and... > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024 > > That's been my project the last couple weeks: A common VkPipeline > implementation built on top of an ESO-like interface. The big > deviation this MR makes from prior art is that I make no attempt at > pretending it's a layered implementation. The vtable for shader > objects looks like ESO but takes its own path when it's useful to do > so. For instance, shader creation always consumes NIR and a handful of > lowering passes are run for you. It's no st_glsl_to_nir but it is a > bit opinionated. Also, a few of the bits that are missing from ESO > such as robustness have been added to the interface. > > In my mind, this marks a pretty fundamental shift in how the Vulkan > runtime works, at least in my mind. Previously, everything was > designed to be a toolbox where you can kind of pick and choose what > you want to use. Also, everything at least tried to act like a layer > where you still implemented Vulkan but you could leave out bits like > render passes if you implemented the new thing and were okay with the > layer. With the ESO code, you implement something that isn't Vulkan > entrypoints and the actual entrypoints live in the runtime. This lets > us expand and adjust the interface as needed for our purposes as well > as sanitize certain things even in the modern API. > > The result is that NVK is starting to feel like a gallium driver. > > So here's the question: do we like this? Do we want to push in this > direction? Should we start making more things work more this way? I'm > not looking for MRs just yet nor do I have more reworks directly > planned. I'm more looking for thoughts and opinions as to how the > various Vulkan driver teams feel about this. We'll leave the detailed > planning for the Mesa issue tracker. > > It's worth noting that, even though I said we've tried to keep things > layerish, there are other parts of the runtime that look like this. > The synchronization code is a good example. The vk_sync interface is > pretty significantly different from the Vulkan objects it's used to > implement. That's worked out pretty well, IMO. With as complicated as > something like pipelines or synchronization are, trying to keep the > illusion of a layer just isn't practical. > > So, do we like this? Should we be pushing more towards drivers being a > backed of the runtime instead of a user of it? > > Now, before anyone asks, no, I don't really want to build a multi-API > abstraction with a Vulkan state tracker. If we were doing this 5 years > ago and Zink didn't already exist, one might be able to make an >
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On Mon, Jan 22, 2024 at 7:20 AM Iago Toral wrote: > > Hi Faith, > > thanks for starting the discussion, we had a bit of an internal chat > here at Igalia to see where we all stand on this and I am sharing some > initial thoughts/questions below: > > El vie, 19-01-2024 a las 11:01 -0600, Faith Ekstrand escribió: > > > Thoughts? > > We think it is fine if the Vulkan runtime implements its own internal > API that doesn't match Vulkan's. If we are going down this path however > we really want to make sure we have good documentation for it so it is > clear how all that works without having to figure things out by looking > at the code. That's a reasonable request. We probably won't re-type the Vulkan spec in comments but having differences documented is reasonable. I'm thinking the level of documentation in vk_graphics_state. > For existing drivers we think it is a bit less clear whether the effort > required to port is going to be worth it. If you end up having to throw > away a lot of what you currently have that already works and in some > cases might even be optimal for your platform it may be a hard ask. > What are your thoughts on this? How much adoption would you be looking > for from existing drivers? That's a good question. One of the problems I'm already seeing is that we have a bunch of common stuff which is in use in some drivers and not in others and I generally don't know why. If there's something problematic about it on some vendor's hardware, we should fix that. If it's just that driver teams don't have the time for refactors, that's a different issue. Unfortunately, I usually don't know besides one-off comments from a developer here and there. And, yeah, I know it can be a lot of work. Hopefully the work pays off in the long run but short-term it's often hard to justify. :-/ > As new features are added to the runtime, we understand some of them > could have dependencies on other features, building on top of them, > requiring drivers to adopt more of the common vulkan runtime to > continue benefiting from additional features, is that how you see this > or would you still expect many runtime features to still be independent > from each other to facilitate driver opt-in on a need-by-need basis? At a feature level, yes. However, one of the big things I'm struggling with right now is layering issues where we really need to flip things around from the driver calling into the runtime to the runtime calling into the driver. One of the things I would LOVE to put in the runtime is YCbCr emulation for drivers that don't natively have multi-plane image support. However, that just isn't possible today thanks to the way things are layered. In particular, we would need the runtime to be able to make one `VkImage` contain multiple driver images and that's just not possible as long as the driver is controlling image creation. We also don't have enough visibility into descriptor sets. People have also talked about trying to do a common ray-tracing implementation. Unfortunately, I just don't see that happening with the current layer model. Unfortunately, I don't have a lot of examples of what that would look like without having written the code to do it. One thing I'm currently thinking about is switching more objects to a kernel vtable model like I did with `vk_pipeline` and `vk_shader` in the posted MR. This puts the runtime in control of the object's life cycle and more easily allows for multiple implementations of an object type. Like right now you can use the common implementation for graphics and compute and roll your own vk_pipeline for ray-tracing. I realize that doesn't really apply to Raspberry Pi but it's an example of what flipping the layering around looks like. The other thing I've been realizing as I've been thinking about this over the week-end is that, if this happens, we're likely heading towards another gallium/classic split for a while. (Though hopefully without the bad blood in the community that we had from gallium.) If this plays out similarly to gallium/classic, a bunch of drivers will remain classic, doing most things themselves and the new thing (which really needs a name, BTW) will be driven by a small subset of drivers and then other drivers get moved over as time allows. This isn't necessarily a bad thing, it's just a recognition of how large-scale changes tend to roll out within Mesa and the potential scope of a more invasive runtime project. Thinking of it this way would also give more freedom to the people building the new thing to just build it without worrying about driver porting and trying to do everything incrementally. If we do attempt this, it needs to be done with a subset of drivers that is as representative of the industry as possible so we don't screw anybody over. I'm currently thinking NVK (1.3, all the features), AGX (all the features but on shit hardware), and Panvk (low features). That won't guarantee the perfect design for everyone, of course, but
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
Hi Faith, thanks for starting the discussion, we had a bit of an internal chat here at Igalia to see where we all stand on this and I am sharing some initial thoughts/questions below: El vie, 19-01-2024 a las 11:01 -0600, Faith Ekstrand escribió: > Yeah, this one's gonna hit Phoronix... > > When we started writing Vulkan drivers back in the day, there was > this > notion that Vulkan was a low-level API that directly targets > hardware. > Vulkan drivers were these super thin things that just blasted packets > straight into the hardware. What little code was common was small and > pretty easy to just copy+paste around. It was a nice thought... > > What's happened in the intervening 8 years is that Vulkan has grown. > A lot. > > We already have several places where we're doing significant > layering. > It started with sharing the WSI code and some Python for generating > dispatch tables. Later we added common synchronization code and a few > vkFoo2 wrappers. Then render passes and... > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024 > > That's been my project the last couple weeks: A common VkPipeline > implementation built on top of an ESO-like interface. The big > deviation this MR makes from prior art is that I make no attempt at > pretending it's a layered implementation. The vtable for shader > objects looks like ESO but takes its own path when it's useful to do > so. For instance, shader creation always consumes NIR and a handful > of > lowering passes are run for you. It's no st_glsl_to_nir but it is a > bit opinionated. Also, a few of the bits that are missing from ESO > such as robustness have been added to the interface. > > In my mind, this marks a pretty fundamental shift in how the Vulkan > runtime works, at least in my mind. Previously, everything was > designed to be a toolbox where you can kind of pick and choose what > you want to use. Also, everything at least tried to act like a layer > where you still implemented Vulkan but you could leave out bits like > render passes if you implemented the new thing and were okay with the > layer. With the ESO code, you implement something that isn't Vulkan > entrypoints and the actual entrypoints live in the runtime. This lets > us expand and adjust the interface as needed for our purposes as well > as sanitize certain things even in the modern API. > > The result is that NVK is starting to feel like a gallium driver. > > So here's the question: do we like this? Do we want to push in this > direction? Should we start making more things work more this way? I'm > not looking for MRs just yet nor do I have more reworks directly > planned. I'm more looking for thoughts and opinions as to how the > various Vulkan driver teams feel about this. We'll leave the detailed > planning for the Mesa issue tracker. > > It's worth noting that, even though I said we've tried to keep things > layerish, there are other parts of the runtime that look like this. > The synchronization code is a good example. The vk_sync interface is > pretty significantly different from the Vulkan objects it's used to > implement. That's worked out pretty well, IMO. With as complicated as > something like pipelines or synchronization are, trying to keep the > illusion of a layer just isn't practical. > > So, do we like this? Should we be pushing more towards drivers being > a > backed of the runtime instead of a user of it? > > Now, before anyone asks, no, I don't really want to build a multi-API > abstraction with a Vulkan state tracker. If we were doing this 5 > years > ago and Zink didn't already exist, one might be able to make an > argument for pushing in that direction. However, that would add a > huge > amount of weight to the project and make it even harder to develop > the > runtime than it already is and for little benefit at this point. > > Here's a few other constraints on what I'm thinking: > > 1. I want it to still be possible for drivers to implement an > extension without piles of runtime plumbing or even bypass the > runtime > on occasion as needed. > > 2. I don't want to recreate the gallium cap disaster drivers should > know exactly what they're advertising. We may want to have some > internal features or properties that are used by the runtime to make > decisions but they'll be in addition to the features and properties > in > Vulkan. > > 3. We've got some meta stuff already but we probably want more. > However, I don't want to force meta on folks who don't want it. > > The big thing here is that if we do this, I'm going to need help. I'm > happy to do a lot of the architectural work but drivers are going to > have to keep up with the changes and I can't take on the burden of > moving 8 different drivers forward. I can answer questions and maybe > help out a bit but the refactoring is going to be too much for one > person, even if that person is me. > > Thoughts? We think it is fine if the Vulkan runtime implements its own internal
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
Hello Faith and everyfrogy! I've been developing a new Vulkan driver for Mesa — Terakan, for AMD TeraScale Evergreen and Northern Islands GPUs — since May of 2023. You can find it in amd/terascale/vulkan on the Terakan branch of my fork at Triang3l/mesa. While it currently lacks many of the graphical features, the architecture of state management, meta, and descriptors, has already largely been implemented in its code. I'm overall relatively new to Mesa, in the past having contributed the fragment shader interlock implementation to RADV that included working with the state management, but never having written a Gallium driver, or a Vulkan driver in the ANV copy-pasting era, so this may be a somewhat fresh — although quite conservative — take on this. Due to various hardware and kernel driver differences (bindings being individually loaded into fixed slots as part of the command buffer state, the lack of command buffer chaining in the kernel resulting in having to reapply all of the state when the size of the hardware command buffer exceeds the HW/KMD limits), I've been designing the architecture of my Vulkan driver largely from scratch, without using the existing Mesa drivers as a reference. Unfortunately, it seems like we ended up going in fundamentally opposite directions in our designs, so I'd say that I'm much more scared about this approach than I am excited about it. My primary concerns about this architecture can be summarized into two categories: • The obligation to manage pipeline and dynamic state in the common representation — essentially mostly the same Vulkan function call arguments, but with an additional layer for processing pNext and merging pipeline and dynamic state — restricts the abilities of drivers to optimize state management for specific hardware. Most importantly, it hampers precompiling of state in pipeline objects. In state management, this would make Mesa Vulkan implementations closer not even to Gallium, but to the dreaded OpenGL. • Certain parts of the common code are designed around assumptions about the majority of the hardware, however some devices may have large architectural differences in specific areas, and trying to adapt the way of programming such hardware subsystems results in having to write suboptimal algorithms, as well as sometimes artificially restricting the VkPhysicalDeviceLimits the device can report. An example from my driver is the meaning of a pipeline layout on fixed-slot TeraScale. Because it uses flat binding indices throughout all sets (sets don't exist in the hardware at all), it needs base offsets for each set within the stage's bindings — which are precomputed at pipeline layout creation. This is fundamentally incompatible with MR !27024's direction to remove the concept of a pipeline layout — and if the common vkCmdBindDescriptorSets makes the VK_KHR_maintenance6 layout-object-less path the only available one, it would add a lot of overhead by making it necessary to recompute the offsets at every bind. I think what we need to consider about pipeline state (in the broader sense, including both state objects and dynamic state) is that it inherently has very different properties from anything the common runtime already covers. What most of the current objects in the common runtime have in common is that they: • Are largely hardware-independent and can work everywhere the same way. • Either: • Provide a complex solution to a large-scale problem, essentially being sort of advanced "middleware". Examples are WSI, synchronization, pipeline cache, secondary command buffer emulation, render pass emulation. • Or, solve a trivial task in a way that's non-intrusive towards algorithms employed by the drivers — such as managing object handles, invoking allocators, reference-counting descriptor set and pipeline layouts, pooling VkCommandBuffer instances. • Rarely influence the design of "hot path" functions, such as changes to pipeline state and bindings. On the other hand, pipeline state: 1. Is entirely hardware-specific. 2. Is modified very frequently — making up the majority of command buffer recording time. 3. Can be precompiled in pipeline objects — and that's highly desirable due to the previous point. Because of 1, there's almost nothing in the pipeline state that the common runtime can help share between drivers. Yes, it can potentially be used to automate running some NIR passes for baking static state into shaders, but currently it looks like the runtime is going in a somewhat different direction, and that needs only some helper functions invoked at pipeline creation time. Aside from that, I can't see it being able to be useful for anything other than merging static and dynamic state into a single structure. For drivers where developers would prefer this approach for various reasons (prototyping simplicity, or staying at the near-original-Vulkan level of
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On 2024/01/20 2:01, Faith Ekstrand wrote: We already have several places where we're doing significant layering. It started with sharing the WSI code I wish that it be possible to compile WSI implementation as separate Vulkan layer *.so module instead of hardcoding and duplicating it to each driver. It will make Vulkan driver more platform-independent and provide proper separation.
Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
On one hand I think it's a great idea. Moving code out of drivers to common means fixing bugs helps everyone, and implementing new features is the same. On the other hand, everyone's already got code that works, which means both a lot of work to switch that code over to common and then the usual cycle of fixing regressions. Gallium is generally pretty great now, so my gut says a 'Zirkonium' common layer is eventually gonna be pretty good too, assuming it can provide the same sorts of efficiency gains; vulkan is a lot thinner than GL, which means CPU utilization becomes more noticable very easily. I'm not saying I'll dive in head first tomorrow, but generally speaking I think 10 years from now it'll be a nice thing to have. Mike On Fri, Jan 19, 2024, 4:02 PM Faith Ekstrand wrote: > Yeah, this one's gonna hit Phoronix... > > When we started writing Vulkan drivers back in the day, there was this > notion that Vulkan was a low-level API that directly targets hardware. > Vulkan drivers were these super thin things that just blasted packets > straight into the hardware. What little code was common was small and > pretty easy to just copy+paste around. It was a nice thought... > > What's happened in the intervening 8 years is that Vulkan has grown. A lot. > > We already have several places where we're doing significant layering. > It started with sharing the WSI code and some Python for generating > dispatch tables. Later we added common synchronization code and a few > vkFoo2 wrappers. Then render passes and... > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024 > > That's been my project the last couple weeks: A common VkPipeline > implementation built on top of an ESO-like interface. The big > deviation this MR makes from prior art is that I make no attempt at > pretending it's a layered implementation. The vtable for shader > objects looks like ESO but takes its own path when it's useful to do > so. For instance, shader creation always consumes NIR and a handful of > lowering passes are run for you. It's no st_glsl_to_nir but it is a > bit opinionated. Also, a few of the bits that are missing from ESO > such as robustness have been added to the interface. > > In my mind, this marks a pretty fundamental shift in how the Vulkan > runtime works, at least in my mind. Previously, everything was > designed to be a toolbox where you can kind of pick and choose what > you want to use. Also, everything at least tried to act like a layer > where you still implemented Vulkan but you could leave out bits like > render passes if you implemented the new thing and were okay with the > layer. With the ESO code, you implement something that isn't Vulkan > entrypoints and the actual entrypoints live in the runtime. This lets > us expand and adjust the interface as needed for our purposes as well > as sanitize certain things even in the modern API. > > The result is that NVK is starting to feel like a gallium driver. > > So here's the question: do we like this? Do we want to push in this > direction? Should we start making more things work more this way? I'm > not looking for MRs just yet nor do I have more reworks directly > planned. I'm more looking for thoughts and opinions as to how the > various Vulkan driver teams feel about this. We'll leave the detailed > planning for the Mesa issue tracker. > > It's worth noting that, even though I said we've tried to keep things > layerish, there are other parts of the runtime that look like this. > The synchronization code is a good example. The vk_sync interface is > pretty significantly different from the Vulkan objects it's used to > implement. That's worked out pretty well, IMO. With as complicated as > something like pipelines or synchronization are, trying to keep the > illusion of a layer just isn't practical. > > So, do we like this? Should we be pushing more towards drivers being a > backed of the runtime instead of a user of it? > > Now, before anyone asks, no, I don't really want to build a multi-API > abstraction with a Vulkan state tracker. If we were doing this 5 years > ago and Zink didn't already exist, one might be able to make an > argument for pushing in that direction. However, that would add a huge > amount of weight to the project and make it even harder to develop the > runtime than it already is and for little benefit at this point. > > Here's a few other constraints on what I'm thinking: > > 1. I want it to still be possible for drivers to implement an > extension without piles of runtime plumbing or even bypass the runtime > on occasion as needed. > > 2. I don't want to recreate the gallium cap disaster drivers should > know exactly what they're advertising. We may want to have some > internal features or properties that are used by the runtime to make > decisions but they'll be in addition to the features and properties in > Vulkan. > > 3. We've got some meta stuff already but we probably want more. > However, I don't want to
Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")
Yeah, this one's gonna hit Phoronix... When we started writing Vulkan drivers back in the day, there was this notion that Vulkan was a low-level API that directly targets hardware. Vulkan drivers were these super thin things that just blasted packets straight into the hardware. What little code was common was small and pretty easy to just copy+paste around. It was a nice thought... What's happened in the intervening 8 years is that Vulkan has grown. A lot. We already have several places where we're doing significant layering. It started with sharing the WSI code and some Python for generating dispatch tables. Later we added common synchronization code and a few vkFoo2 wrappers. Then render passes and... https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024 That's been my project the last couple weeks: A common VkPipeline implementation built on top of an ESO-like interface. The big deviation this MR makes from prior art is that I make no attempt at pretending it's a layered implementation. The vtable for shader objects looks like ESO but takes its own path when it's useful to do so. For instance, shader creation always consumes NIR and a handful of lowering passes are run for you. It's no st_glsl_to_nir but it is a bit opinionated. Also, a few of the bits that are missing from ESO such as robustness have been added to the interface. In my mind, this marks a pretty fundamental shift in how the Vulkan runtime works, at least in my mind. Previously, everything was designed to be a toolbox where you can kind of pick and choose what you want to use. Also, everything at least tried to act like a layer where you still implemented Vulkan but you could leave out bits like render passes if you implemented the new thing and were okay with the layer. With the ESO code, you implement something that isn't Vulkan entrypoints and the actual entrypoints live in the runtime. This lets us expand and adjust the interface as needed for our purposes as well as sanitize certain things even in the modern API. The result is that NVK is starting to feel like a gallium driver. So here's the question: do we like this? Do we want to push in this direction? Should we start making more things work more this way? I'm not looking for MRs just yet nor do I have more reworks directly planned. I'm more looking for thoughts and opinions as to how the various Vulkan driver teams feel about this. We'll leave the detailed planning for the Mesa issue tracker. It's worth noting that, even though I said we've tried to keep things layerish, there are other parts of the runtime that look like this. The synchronization code is a good example. The vk_sync interface is pretty significantly different from the Vulkan objects it's used to implement. That's worked out pretty well, IMO. With as complicated as something like pipelines or synchronization are, trying to keep the illusion of a layer just isn't practical. So, do we like this? Should we be pushing more towards drivers being a backed of the runtime instead of a user of it? Now, before anyone asks, no, I don't really want to build a multi-API abstraction with a Vulkan state tracker. If we were doing this 5 years ago and Zink didn't already exist, one might be able to make an argument for pushing in that direction. However, that would add a huge amount of weight to the project and make it even harder to develop the runtime than it already is and for little benefit at this point. Here's a few other constraints on what I'm thinking: 1. I want it to still be possible for drivers to implement an extension without piles of runtime plumbing or even bypass the runtime on occasion as needed. 2. I don't want to recreate the gallium cap disaster drivers should know exactly what they're advertising. We may want to have some internal features or properties that are used by the runtime to make decisions but they'll be in addition to the features and properties in Vulkan. 3. We've got some meta stuff already but we probably want more. However, I don't want to force meta on folks who don't want it. The big thing here is that if we do this, I'm going to need help. I'm happy to do a lot of the architectural work but drivers are going to have to keep up with the changes and I can't take on the burden of moving 8 different drivers forward. I can answer questions and maybe help out a bit but the refactoring is going to be too much for one person, even if that person is me. Thoughts? ~Faith