Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-26 Thread Daniel Stone
On Fri, 26 Jan 2024 at 00:22, Faith Ekstrand  wrote:
> On Thu, Jan 25, 2024 at 5:06 PM Gert Wollny  wrote:
>> I think with Venus we are more interested in using utility libraries on
>> an as-needed basis. Here, most of the time the Vulkan commands are just
>> serialized according to the Venus protocol and this is then passed to
>> the host because usually it wouldn't make sense to let the guest
>> translate the Vulkan commands to something different (e.g. something
>> that is commonly used in a runtime), only to then re-encode this in the
>> Venus driver to satisfy the host Vulkan driver -  just think Spir-V:
>> why would we want to have NIR only to then re-encode it to Spir-V?
>
> I think Venus is an entirely different class of driver. It's not even really 
> a driver. It's more of a Vulkan layer that has a VM boundary in the middle. 
> It's attempting to be as thin of a Vulkan -> Vulkan pass-through as possible. 
> As such, it doesn't use most of the shared stuff anyway. It uses the dispatch 
> framework and that's really about it. As long as that code stays in-tree 
> roughly as-is, I think Venus will be fine.

The eternal response: you forgot WSI!

Cheers,
Daniel


Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-25 Thread Faith Ekstrand
On Thu, Jan 25, 2024 at 5:06 PM Gert Wollny  wrote:

> Hi,
>
> thanks, Faith, for bringing this discussion up.
>
> I think with Venus we are more interested in using utility libraries on
> an as-needed basis. Here, most of the time the Vulkan commands are just
> serialized according to the Venus protocol and this is then passed to
> the host because usually it wouldn't make sense to let the guest
> translate the Vulkan commands to something different (e.g. something
> that is commonly used in a runtime), only to then re-encode this in the
> Venus driver to satisfy the host Vulkan driver -  just think Spir-V:
> why would we want to have NIR only to then re-encode it to Spir-V?
>

I think Venus is an entirely different class of driver. It's not even
really a driver. It's more of a Vulkan layer that has a VM boundary in the
middle. It's attempting to be as thin of a Vulkan -> Vulkan pass-through as
possible. As such, it doesn't use most of the shared stuff anyway. It uses
the dispatch framework and that's really about it. As long as that code
stays in-tree roughly as-is, I think Venus will be fine.


> I'd also like to give a +1 to the points raised by Triang3l and others
> about the potential of breaking other drivers. I've certainly be bitten
> by this on the Gallium side with r600, and unfortunately I can't set up
> a CI in my home office (and after watching the XDC talk about setting
> up your own CI I was even more discouraged to do this).
>

That's a risk with all common code. You could raise the same risk with NIR
or basically anything else. Sure, if someone wants to go write all the code
themselves in an attempt to avoid bugs, I guess they're free to do that. I
don't really see that as a compelling argument, though. Also, while you
experienced gallium breakage with r600, having worked on i965, I can
guarantee you that that's still better than maintaining a classic
(non-gallium) GL driver. 

At the moment, given the responses I've seen and the scope of the project
as things are starting to congeal in my head, I don't think this will be an
incremental thing where drivers get converted as we go anymore. If we
really do want to flip the flow, I think it'll be invasive enough that
we'll build gallium2 and then people can port to it if they want. I may
port a driver or two myself but those will be things I own or am at least
willing to deal with the bug fallout for. Others can port or not at-will.

This is what I meant when I said elsewhere that we're probably heading
towards a gallium/classic situation again. I don't expect anyone to port
until the benefits outweigh the costs but I do expect the benefits will be
there eventually.

~Faith


Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-25 Thread Gert Wollny
Hi, 

thanks, Faith, for bringing this discussion up. 

I think with Venus we are more interested in using utility libraries on
an as-needed basis. Here, most of the time the Vulkan commands are just
serialized according to the Venus protocol and this is then passed to
the host because usually it wouldn't make sense to let the guest
translate the Vulkan commands to something different (e.g. something
that is commonly used in a runtime), only to then re-encode this in the
Venus driver to satisfy the host Vulkan driver -  just think Spir-V:
why would we want to have NIR only to then re-encode it to Spir-V?

I'd also like to give a +1 to the points raised by Triang3l and others
about the potential of breaking other drivers. I've certainly be bitten
by this on the Gallium side with r600, and unfortunately I can't set up
a CI in my home office (and after watching the XDC talk about setting
up your own CI I was even more discouraged to do this).

In summary I certainly see the advantage in using common code, but with
these two points above in mind I think opt-in is better.

Gert




Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-25 Thread Triang3l

On 24/01/2024 18:26, Faith Ekstrand wrote:

> So far, we've been trying to build those components in terms of the
> Vulkan API itself with calls jumping back into the dispatch table to
> try and get inside the driver.

To me, it looks like the "opt-in" approach would still be well-applicable
to the goal of cleaning up "implementing Vulkan in Vulkan", and gradual
changes diverging from the usual Vulkan specification behavior can be
implemented and maintained in existing and new drivers more efficiently
compared to a whole new programming model.

I think it's important that the scale of our solution should be appropriate
to the scale of the problem, otherwise we risk creating large issues in
other areas. Currently there are pretty few places where Mesa implements
Vulkan on top of Vulkan:
 • WSI,
 • Emulated render passes,
 • Emulated secondary command buffers,
 • Meta.

For WSI, render passes and secondary command buffers, I don't think there's
anything that needs to be done, as those already have little to none driver
backend involvement or interference with application's calls — render pass
and secondary command buffer emulation interacts with the hardware driver
entirely within the framework of the Vulkan specification, only storing a
few fields in vk_command_buffer which are already handled fully in common
code.

Common meta, on the other hand, yes, is extremely intrusive — overriding
the application's pipeline state, bindings, and passing shaders directly in
NIR bypassing SPIR-V.

But with meta being such a different beast, I think we shouldn't even be
trying to tame it with the same interfaces as everything else. If we're
going to handle meta's special cases throughout our common "Gallium2"
framework, it feels like we'll simply be turning our "Vulkan on Vulkan"
issue into the problem of "implementing Gallium2 on Gallium2".

Instead, I think the cleanest solution in the common meta would be sending
commands to the driver through a separate callback interface specifically
for meta instead of trying to make meta mimic application code. That would
allow drivers to clearly negotiate the details of applying/reverting state
changes, shader compilation, while letting their developers assume that
everything else is written for the most part purely against the Vulkan
specification.

It would still be okay for meta to make calls to vkGetPhysicalDevice*,
vkCreate*/vkDestroy*, as long as they're done within the rules of the
Vulkan specification, to require certain extensions, as well as to do some
less-intrusive, non-hot-path interaction with the driver's internals
directly — such as requiring that every VkImage is a vk_image and pulling
the needed create info fields from there. However, everything interacting
with the state/bindings, as well as things going beyond the specification
like creating image views with incompatible formats, would be going through
those new callbacks.

NVK-style drivers would be able to share a common implementation of those
callbacks. Drivers that want to take advantage of more direct-to-hardware
paths would need to provide what's friendly to them (maybe even with
lighter handling of compute-based meta operations compared to graphics
ones). That'd probably be not a single flat list of callbacks, but a bunch
of ones — like it'd be possible for a driver to use the common command
buffer callbacks, but to specialize some view/descriptor-related ones (it
may not be possible to make those common at all, by the way). And if a
driver doesn't need the common meta at all, none of that would be bothering
it.

The other advantages I see in this separate meta API approach are:
 • In the rest of the code, driver developers in most cases will need to
   refer to only a single authority — the massively detailed Vulkan
   specification, and there are risks regarding rolling our own interface
   for everything:
   • Driver developers will have to spend more time carefully looking up
 what they need to do in two places rather than largely just one.
   • We're much more prone to leaving gaps in our interface and to writing
 lacking documentation. I can't see this effort not being rushed, with
 us having to catch up to 10 years of XGL/Vulkan development, while
 moving many drivers alongside working on other tasks, and with varying
 levels of enthusiasm of driver developers towards this. Unless zmike's
 10 years estimate is our actual target 路
   • Having to deal with a new large-scale API may raise the barrier for
 new contributors and discourage them.
 Unlike with OpenGL with all the resource renaming stuff, except for
 shader compilation, the experience I got from developing applications
 on Vulkan was enough for me to start comfortably implementing it.
 When zmike showed me an R600g issue about some relation of vertex
 buffer bindings and CSOs, I just didn't have anything useful to say.
 • Faster iteration inside the common meta code, with the meta 

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-25 Thread Faith Ekstrand
On Thu, Jan 25, 2024 at 8:57 AM Jose Fonseca 
wrote:

> > So far, we've been trying to build those components in terms of the
> Vulkan API itself with calls jumping back into the dispatch table to try
> and get inside the driver. This is working but it's getting more and more
> fragile the more tools we add to that box. A lot of what I want to do with
> gallium2 or whatever we're calling it is to fix our layering problems so
> that calls go in one direction and we can untangle the jumble. I'm still
> not sure what I want that to look like but I think I want it to look a lot
> like Vulkan, just with a handier interface.
>
> That resonates with my experience.  For example, Galllium draw module does
> some of this too -- it provides its own internal interfaces for drivers,
> but it also loops back into Gallium top interface to set FS and rasterizer
> state -- and that has *always* been a source of grief.  Having control
> flow proceeding through layers in one direction only seems an important
> principle to observe.  It's fine if the lower interface is the same
> interface (e.g., Gallium to Gallium, or Vulkan to Vulkan as you allude),
> but they shouldn't be the same exact entry-points/modules (ie, no
> reentrancy/recursion.)
>
> It's also worth considering that Vulkan extensibility could come in hand
> too in what you want to achieve.  For example, Mesa Vulkan drivers could
> have their own VK_MESA_internal_ extensions that could be used by the
> shared Vulkan code to do lower level things.
>

We already do that for a handful of things. The fact that Vulkan doesn't
ever check the stuff in the pNext chain is really useful for that. 

~Faith


> Jose
>
>
> On Wed, Jan 24, 2024 at 3:26 PM Faith Ekstrand 
> wrote:
>
>> Jose,
>>
>> Thanks for your thoughts!
>>
>> On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca 
>> wrote:
>> >
>> > I don't know much about the current Vulkan driver internals to have or
>> provide an informed opinion on the path forward, but I'd like to share my
>> backwards looking perspective.
>> >
>> > Looking back, Gallium was two things effectively:
>> > (1) an abstraction layer, that's watertight (as in upper layers
>> shouldn't reach through to lower layers)
>> > (2) an ecosystem of reusable components (draw, util, tgsi, etc.)
>> >
>> > (1) was of course important -- and the discipline it imposed is what
>> enabled to great simplifications -- but it also became a straight-jacket,
>> as GPUs didn't stand still, and sooner or later the
>> see-every-hardware-as-the-same lenses stop reflecting reality.
>> >
>> > If I had to pick one, I'd say that (2) is far more useful and
>> practical.Take components like gallium's draw and other util modules. A
>> driver can choose to use them or not.  One could fork them within Mesa
>> source tree, and only the drivers that opt-in into the fork would need to
>> be tested/adapted/etc
>> >
>> > On the flip side, Vulkan API is already a pretty low level HW
>> abstraction.  It's also very flexible and extensible, so it's hard to
>> provide a watertight abstraction underneath it without either taking the
>> lowest common denominator, or having lots of optional bits of functionality
>> governed by a myriad of caps like you alluded to.
>>
>> There is a third thing that isn't really recognized in your description:
>>
>> (3) A common "language" to talk about GPUs and data structures that
>> represent that language
>>
>> This is precisely what the Vulkan runtime today doesn't have. Classic
>> meta sucked because we were trying to implement GL in GL. u_blitter,
>> on the other hand, is pretty fantastic because Gallium provides a much
>> more sane interface to write those common components in terms of.
>>
>> So far, we've been trying to build those components in terms of the
>> Vulkan API itself with calls jumping back into the dispatch table to
>> try and get inside the driver. This is working but it's getting more
>> and more fragile the more tools we add to that box. A lot of what I
>> want to do with gallium2 or whatever we're calling it is to fix our
>> layering problems so that calls go in one direction and we can
>> untangle the jumble. I'm still not sure what I want that to look like
>> but I think I want it to look a lot like Vulkan, just with a handier
>> interface.
>>
>> ~Faith
>>
>> > Not sure how useful this is in practice to you, but the lesson from my
>> POV is that opt-in reusable and shared libraries are always time well spent
>> as they can bend and adapt with the times, whereas no opt-out watertight
>> abstractions inherently have a shelf life.
>> >
>> > Jose
>> >
>> > On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand 
>> wrote:
>> >>
>> >> Yeah, this one's gonna hit Phoronix...
>> >>
>> >> When we started writing Vulkan drivers back in the day, there was this
>> >> notion that Vulkan was a low-level API that directly targets hardware.
>> >> Vulkan drivers were these super thin things that just blasted packets
>> >> straight into the hardware. What 

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-25 Thread Jose Fonseca
> So far, we've been trying to build those components in terms of the
Vulkan API itself with calls jumping back into the dispatch table to try
and get inside the driver. This is working but it's getting more and more
fragile the more tools we add to that box. A lot of what I want to do with
gallium2 or whatever we're calling it is to fix our layering problems so
that calls go in one direction and we can untangle the jumble. I'm still
not sure what I want that to look like but I think I want it to look a lot
like Vulkan, just with a handier interface.

That resonates with my experience.  For example, Galllium draw module does
some of this too -- it provides its own internal interfaces for drivers,
but it also loops back into Gallium top interface to set FS and rasterizer
state -- and that has *always* been a source of grief.  Having control flow
proceeding through layers in one direction only seems an important
principle to observe.  It's fine if the lower interface is the same
interface (e.g., Gallium to Gallium, or Vulkan to Vulkan as you allude),
but they shouldn't be the same exact entry-points/modules (ie, no
reentrancy/recursion.)

It's also worth considering that Vulkan extensibility could come in hand
too in what you want to achieve.  For example, Mesa Vulkan drivers could
have their own VK_MESA_internal_ extensions that could be used by the
shared Vulkan code to do lower level things.

Jose


On Wed, Jan 24, 2024 at 3:26 PM Faith Ekstrand  wrote:

> Jose,
>
> Thanks for your thoughts!
>
> On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca 
> wrote:
> >
> > I don't know much about the current Vulkan driver internals to have or
> provide an informed opinion on the path forward, but I'd like to share my
> backwards looking perspective.
> >
> > Looking back, Gallium was two things effectively:
> > (1) an abstraction layer, that's watertight (as in upper layers
> shouldn't reach through to lower layers)
> > (2) an ecosystem of reusable components (draw, util, tgsi, etc.)
> >
> > (1) was of course important -- and the discipline it imposed is what
> enabled to great simplifications -- but it also became a straight-jacket,
> as GPUs didn't stand still, and sooner or later the
> see-every-hardware-as-the-same lenses stop reflecting reality.
> >
> > If I had to pick one, I'd say that (2) is far more useful and
> practical.Take components like gallium's draw and other util modules. A
> driver can choose to use them or not.  One could fork them within Mesa
> source tree, and only the drivers that opt-in into the fork would need to
> be tested/adapted/etc
> >
> > On the flip side, Vulkan API is already a pretty low level HW
> abstraction.  It's also very flexible and extensible, so it's hard to
> provide a watertight abstraction underneath it without either taking the
> lowest common denominator, or having lots of optional bits of functionality
> governed by a myriad of caps like you alluded to.
>
> There is a third thing that isn't really recognized in your description:
>
> (3) A common "language" to talk about GPUs and data structures that
> represent that language
>
> This is precisely what the Vulkan runtime today doesn't have. Classic
> meta sucked because we were trying to implement GL in GL. u_blitter,
> on the other hand, is pretty fantastic because Gallium provides a much
> more sane interface to write those common components in terms of.
>
> So far, we've been trying to build those components in terms of the
> Vulkan API itself with calls jumping back into the dispatch table to
> try and get inside the driver. This is working but it's getting more
> and more fragile the more tools we add to that box. A lot of what I
> want to do with gallium2 or whatever we're calling it is to fix our
> layering problems so that calls go in one direction and we can
> untangle the jumble. I'm still not sure what I want that to look like
> but I think I want it to look a lot like Vulkan, just with a handier
> interface.
>
> ~Faith
>
> > Not sure how useful this is in practice to you, but the lesson from my
> POV is that opt-in reusable and shared libraries are always time well spent
> as they can bend and adapt with the times, whereas no opt-out watertight
> abstractions inherently have a shelf life.
> >
> > Jose
> >
> > On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand 
> wrote:
> >>
> >> Yeah, this one's gonna hit Phoronix...
> >>
> >> When we started writing Vulkan drivers back in the day, there was this
> >> notion that Vulkan was a low-level API that directly targets hardware.
> >> Vulkan drivers were these super thin things that just blasted packets
> >> straight into the hardware. What little code was common was small and
> >> pretty easy to just copy+paste around. It was a nice thought...
> >>
> >> What's happened in the intervening 8 years is that Vulkan has grown. A
> lot.
> >>
> >> We already have several places where we're doing significant layering.
> >> It started with sharing the WSI code and some Python 

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-24 Thread Zack Rusin
On Wed, Jan 24, 2024 at 6:57 PM Marek Olšák  wrote:
>
> Gallium looks like it was just a copy of DX10, and likely many things were 
> known from DX10 in advance before anything started. Vulkanium doesn't have 
> anything to draw inspiration from. It's a completely unexplored idea.

I'm not sure if I follow this. GNU/Linux didn't have a unified driver
interface to implement GL, but Windows did have a standardized
interface to implement D3D10 which we drew inspiration from. The same
is still true if you s/GL/Vulkan/ and s/D3D10/D3D12/. It's just that
more features of modern API's are tied to kernel features (i.e. wddm
versions) than in the past, but with gpuvm, drm scheduler and syncobj
that's also going to be Vulkan's path.
Now, you might say that this time we're not going to use any lessons
from Windows and this interface will be completely unlike what Windows
does for D3D12, which is fine but I still wouldn't call the idea of
standardizing an interface for a low level graphics API a completely
unexplored idea given that it works on Windows on an api that's a lot
more like Vulkan, than D3D10 was like GL.

z


Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-24 Thread Marek Olšák
Gallium looks like it was just a copy of DX10, and likely many things were
known from DX10 in advance before anything started. Vulkanium doesn't have
anything to draw inspiration from. It's a completely unexplored idea.

AMD's PAL is the same idea as Gallium. It's used to implement Vulkan, DX,
Mantle, Metal, etc.

Marek

On Wed, Jan 24, 2024, 13:40 Faith Ekstrand  wrote:

> On Wed, Jan 24, 2024 at 12:26 PM Zack Rusin 
> wrote:
> >
> > On Wed, Jan 24, 2024 at 10:27 AM Faith Ekstrand 
> wrote:
> > >
> > > Jose,
> > >
> > > Thanks for your thoughts!
> > >
> > > On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca <
> jose.fons...@broadcom.com> wrote:
> > > >
> > > > I don't know much about the current Vulkan driver internals to have
> or provide an informed opinion on the path forward, but I'd like to share
> my backwards looking perspective.
> > > >
> > > > Looking back, Gallium was two things effectively:
> > > > (1) an abstraction layer, that's watertight (as in upper layers
> shouldn't reach through to lower layers)
> > > > (2) an ecosystem of reusable components (draw, util, tgsi, etc.)
> > > >
> > > > (1) was of course important -- and the discipline it imposed is what
> enabled to great simplifications -- but it also became a straight-jacket,
> as GPUs didn't stand still, and sooner or later the
> see-every-hardware-as-the-same lenses stop reflecting reality.
> > > >
> > > > If I had to pick one, I'd say that (2) is far more useful and
> practical.Take components like gallium's draw and other util modules. A
> driver can choose to use them or not.  One could fork them within Mesa
> source tree, and only the drivers that opt-in into the fork would need to
> be tested/adapted/etc
> > > >
> > > > On the flip side, Vulkan API is already a pretty low level HW
> abstraction.  It's also very flexible and extensible, so it's hard to
> provide a watertight abstraction underneath it without either taking the
> lowest common denominator, or having lots of optional bits of functionality
> governed by a myriad of caps like you alluded to.
> > >
> > > There is a third thing that isn't really recognized in your
> description:
> > >
> > > (3) A common "language" to talk about GPUs and data structures that
> > > represent that language
> > >
> > > This is precisely what the Vulkan runtime today doesn't have. Classic
> > > meta sucked because we were trying to implement GL in GL. u_blitter,
> > > on the other hand, is pretty fantastic because Gallium provides a much
> > > more sane interface to write those common components in terms of.
> > >
> > > So far, we've been trying to build those components in terms of the
> > > Vulkan API itself with calls jumping back into the dispatch table to
> > > try and get inside the driver. This is working but it's getting more
> > > and more fragile the more tools we add to that box. A lot of what I
> > > want to do with gallium2 or whatever we're calling it is to fix our
> > > layering problems so that calls go in one direction and we can
> > > untangle the jumble. I'm still not sure what I want that to look like
> > > but I think I want it to look a lot like Vulkan, just with a handier
> > > interface.
> >
> > Yes, that makes sense. When we were writing the initial components for
> > gallium (draw and cso) I really liked the general concept and thought
> > about trying to reuse them in the old, non-gallium Mesa drivers but
> > the obstacle was that there was no common interface to lay them on.
> > Using GL to implement GL was silly and using Vulkan to implement
> > Vulkan is not much better.
> >
> > Having said that my general thoughts on GPU abstractions largely match
> > what Jose has said. To me it's a question of whether a clean
> > abstraction:
> > - on top of which you can build an entire GPU driver toolkit (i.e. all
> > the components and helpers)
> > - that makes it trivial to figure up what needs to be done to write a
> > new driver and makes bootstrapping a new driver a lot simpler
> > - that makes it easier to reason about cross hardware concepts (it's a
> > lot easier to understand the entirety of the ecosystem if every driver
> > is not doing something unique to implement similar functionality)
> > is worth more than almost exponentially increasing the difficulty of:
> > - advancing the ecosystem (i.e. it might be easier to understand but
> > it's way harder to create clean abstractions across such different
> > hardware).
> > - driver maintenance (i.e. there will be a constant stream of
> > regressions hitting your driver as a result of other people working on
> > their drivers)
> > - general development (i.e. bug fixes/new features being held back
> > because they break some other driver)
> >
> > Some of those can certainly be titled one way or the other, e.g. the
> > driver maintenance con be somewhat eased by requiring that every
> > driver working on top of the new abstraction has to have a stable
> > Mesa-CI setup (be it lava or ci-tron, or whatever) but all of those

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-24 Thread Triang3l

I'll agree with Jose about Vulkan being a low-level abstraction, and to me
the "opt-in" way seems like a much more balanced approach to achieving our
goals — not only balanced between the goals themselves (code amount and
time to implement aren't our only criteria to optimize), but also across
the variety of hardware — as if something goes wrong with the watertight
abstraction for a certain implementation, not only it'd take more time to
find a solution, but issues of one driver risk wasting time of everyone as
it'd often be necessary to make debatable changes to interfaces used by all
drivers.

I also need to further clarify my point regardless the design of what we
want to encourage drivers to use, specifically about pipeline objects and
dynamic state/ESO.

Vulkan, as I see from all the perspectives I'm regularly interacting with
it from — as an RHI programmer at a game studio, a translation layer
developer (the Xenia Xbox 360 emulator), and now creating a driver for it —
has not grown much thicker than it originally was. What has increased is
its surface area — but where it's actually important: letting applications
more precisely convey their intentions.

I'd say it's even thinner and more transparent from this point of view now.
We got nice things like inline uniform blocks, host image copy, push
descriptors, descriptor buffers, and of course dynamic state — and they all
pretty much directly correspond to some hardware concepts, that apps can
utilize to do what they want with less indirection between their actual
architecture and the hardware.

Essentially, the application and the driver (and the rest of the chain —
the specification, I'd like to retract my statement about "fighting" it, by
the way, and the hardware controlled by that driver) can work more
cooperatively now, towards their common goal of delivering what the app
developer wants to provide to the user with as high quality and speed as
realistically possible. They now have more ways of helping each other by
communicating their intentions and capabilities to each other more
completely and accurately.

And it's important for us not to go *backwards*.


This is why I think it's just fundamentally wrong to encourage drivers to
layer pipeline objects and static state on top of dynamic state.

An application would typically use static state when it:
 • Knows the potentially needed state setups in advance (like in a game
   with demands of materials preprocessed, or in a non-gaming/non-DCC app).
 • Wants to quickly apply a complete state configuration.
 • Maybe doesn't care much about the state used by previously done work,
   like drawing wildly different kinds of objects in a scene.
At the same time, it'd choose dynamic if it:
 • Doesn't have upfront knowledge of possible states (like in an OpenGL/
   D3D9/D3D11 translation layer or a console emulator, or with a highly
   flexible art pipeline in the game).
 • Wants to quickly make small, incremental state changes.
 • Maybe wills to mix state variables updated at different frequencies.

Their use cases, and application's intentions they convey, are as opposite
as the antonymous words "static" and "dynamic" they're called. Treating one
like a specialization of the other is making the driver blind in the same
way as back in 2016 when applications had no other option but to reduce
everything to static state.

(Of course with state spanning so many pipeline stages, applications would
usually not just be picking one of the two extremes, and instead may want
static for some cases/stages and dynamic for the other. This is also where
the route Vulkan's development over the 8 years has taken is very wise:
instead of forcing Escobar's axiom of choice upon applications, let them
specify their intentions on a per-variable basis, and choose the
appropriate amount of state grouping among monolithic pipelines, GPL with
libraries containing one or multiple parts of a pipeline, and ESO.)


The primary rule of game optimization is, if you can avoid doing something
every frame, or, even worse, hundreds or thousands of times per frame, do
whatever reuse you can to avoid that. If we know that's what the game wants
to do — by providing a pipeline object with the state it wants to be
static, a pipeline layout object — we should be aiding it. Just like if the
the game tells us that it can't precompile something, the graphics stack
should do the best it can in this situation — it would be wrong to add the
overhead of running a time machine to 2016 to its draws either. After all,
the driver's draw call code and the game's draw call code are both just
draw call code with one common goal.


So, it's important that whichever solution we end up with, it must not be a
"broken telephone" degrading the cooperation between the application and
the driver. And we should not forget that the communication between them is
two-way, which includes:
 • Interface calls done by the app.
 • Limits and features exposed by the driver.


Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-24 Thread Faith Ekstrand
On Wed, Jan 24, 2024 at 12:26 PM Zack Rusin  wrote:
>
> On Wed, Jan 24, 2024 at 10:27 AM Faith Ekstrand 
wrote:
> >
> > Jose,
> >
> > Thanks for your thoughts!
> >
> > On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca 
wrote:
> > >
> > > I don't know much about the current Vulkan driver internals to have
or provide an informed opinion on the path forward, but I'd like to share
my backwards looking perspective.
> > >
> > > Looking back, Gallium was two things effectively:
> > > (1) an abstraction layer, that's watertight (as in upper layers
shouldn't reach through to lower layers)
> > > (2) an ecosystem of reusable components (draw, util, tgsi, etc.)
> > >
> > > (1) was of course important -- and the discipline it imposed is what
enabled to great simplifications -- but it also became a straight-jacket,
as GPUs didn't stand still, and sooner or later the
see-every-hardware-as-the-same lenses stop reflecting reality.
> > >
> > > If I had to pick one, I'd say that (2) is far more useful and
practical.Take components like gallium's draw and other util modules. A
driver can choose to use them or not.  One could fork them within Mesa
source tree, and only the drivers that opt-in into the fork would need to
be tested/adapted/etc
> > >
> > > On the flip side, Vulkan API is already a pretty low level HW
abstraction.  It's also very flexible and extensible, so it's hard to
provide a watertight abstraction underneath it without either taking the
lowest common denominator, or having lots of optional bits of functionality
governed by a myriad of caps like you alluded to.
> >
> > There is a third thing that isn't really recognized in your description:
> >
> > (3) A common "language" to talk about GPUs and data structures that
> > represent that language
> >
> > This is precisely what the Vulkan runtime today doesn't have. Classic
> > meta sucked because we were trying to implement GL in GL. u_blitter,
> > on the other hand, is pretty fantastic because Gallium provides a much
> > more sane interface to write those common components in terms of.
> >
> > So far, we've been trying to build those components in terms of the
> > Vulkan API itself with calls jumping back into the dispatch table to
> > try and get inside the driver. This is working but it's getting more
> > and more fragile the more tools we add to that box. A lot of what I
> > want to do with gallium2 or whatever we're calling it is to fix our
> > layering problems so that calls go in one direction and we can
> > untangle the jumble. I'm still not sure what I want that to look like
> > but I think I want it to look a lot like Vulkan, just with a handier
> > interface.
>
> Yes, that makes sense. When we were writing the initial components for
> gallium (draw and cso) I really liked the general concept and thought
> about trying to reuse them in the old, non-gallium Mesa drivers but
> the obstacle was that there was no common interface to lay them on.
> Using GL to implement GL was silly and using Vulkan to implement
> Vulkan is not much better.
>
> Having said that my general thoughts on GPU abstractions largely match
> what Jose has said. To me it's a question of whether a clean
> abstraction:
> - on top of which you can build an entire GPU driver toolkit (i.e. all
> the components and helpers)
> - that makes it trivial to figure up what needs to be done to write a
> new driver and makes bootstrapping a new driver a lot simpler
> - that makes it easier to reason about cross hardware concepts (it's a
> lot easier to understand the entirety of the ecosystem if every driver
> is not doing something unique to implement similar functionality)
> is worth more than almost exponentially increasing the difficulty of:
> - advancing the ecosystem (i.e. it might be easier to understand but
> it's way harder to create clean abstractions across such different
> hardware).
> - driver maintenance (i.e. there will be a constant stream of
> regressions hitting your driver as a result of other people working on
> their drivers)
> - general development (i.e. bug fixes/new features being held back
> because they break some other driver)
>
> Some of those can certainly be titled one way or the other, e.g. the
> driver maintenance con be somewhat eased by requiring that every
> driver working on top of the new abstraction has to have a stable
> Mesa-CI setup (be it lava or ci-tron, or whatever) but all of those
> things need to be reasoned about. In my experience abstractions never
> have uniform support because some people will value cons of them more
> than they value the pros. So the entire process requires some very
> steadfast individuals to keep going despite hearing that the effort is
> dumb, at least until the benefits of the new approach are impossible
> to deny. So you know... "how much do you believe in this approach
> because some days will suck and you can't give up" ;) is probably the
> question.

Well, I've built my entire career out of doing things that others said 

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-24 Thread Zack Rusin
On Wed, Jan 24, 2024 at 10:27 AM Faith Ekstrand  wrote:
>
> Jose,
>
> Thanks for your thoughts!
>
> On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca  
> wrote:
> >
> > I don't know much about the current Vulkan driver internals to have or 
> > provide an informed opinion on the path forward, but I'd like to share my 
> > backwards looking perspective.
> >
> > Looking back, Gallium was two things effectively:
> > (1) an abstraction layer, that's watertight (as in upper layers shouldn't 
> > reach through to lower layers)
> > (2) an ecosystem of reusable components (draw, util, tgsi, etc.)
> >
> > (1) was of course important -- and the discipline it imposed is what 
> > enabled to great simplifications -- but it also became a straight-jacket, 
> > as GPUs didn't stand still, and sooner or later the 
> > see-every-hardware-as-the-same lenses stop reflecting reality.
> >
> > If I had to pick one, I'd say that (2) is far more useful and practical.
> > Take components like gallium's draw and other util modules. A driver can 
> > choose to use them or not.  One could fork them within Mesa source tree, 
> > and only the drivers that opt-in into the fork would need to be 
> > tested/adapted/etc
> >
> > On the flip side, Vulkan API is already a pretty low level HW abstraction.  
> > It's also very flexible and extensible, so it's hard to provide a 
> > watertight abstraction underneath it without either taking the lowest 
> > common denominator, or having lots of optional bits of functionality 
> > governed by a myriad of caps like you alluded to.
>
> There is a third thing that isn't really recognized in your description:
>
> (3) A common "language" to talk about GPUs and data structures that
> represent that language
>
> This is precisely what the Vulkan runtime today doesn't have. Classic
> meta sucked because we were trying to implement GL in GL. u_blitter,
> on the other hand, is pretty fantastic because Gallium provides a much
> more sane interface to write those common components in terms of.
>
> So far, we've been trying to build those components in terms of the
> Vulkan API itself with calls jumping back into the dispatch table to
> try and get inside the driver. This is working but it's getting more
> and more fragile the more tools we add to that box. A lot of what I
> want to do with gallium2 or whatever we're calling it is to fix our
> layering problems so that calls go in one direction and we can
> untangle the jumble. I'm still not sure what I want that to look like
> but I think I want it to look a lot like Vulkan, just with a handier
> interface.

Yes, that makes sense. When we were writing the initial components for
gallium (draw and cso) I really liked the general concept and thought
about trying to reuse them in the old, non-gallium Mesa drivers but
the obstacle was that there was no common interface to lay them on.
Using GL to implement GL was silly and using Vulkan to implement
Vulkan is not much better.

Having said that my general thoughts on GPU abstractions largely match
what Jose has said. To me it's a question of whether a clean
abstraction:
- on top of which you can build an entire GPU driver toolkit (i.e. all
the components and helpers)
- that makes it trivial to figure up what needs to be done to write a
new driver and makes bootstrapping a new driver a lot simpler
- that makes it easier to reason about cross hardware concepts (it's a
lot easier to understand the entirety of the ecosystem if every driver
is not doing something unique to implement similar functionality)
is worth more than almost exponentially increasing the difficulty of:
- advancing the ecosystem (i.e. it might be easier to understand but
it's way harder to create clean abstractions across such different
hardware).
- driver maintenance (i.e. there will be a constant stream of
regressions hitting your driver as a result of other people working on
their drivers)
- general development (i.e. bug fixes/new features being held back
because they break some other driver)

Some of those can certainly be titled one way or the other, e.g. the
driver maintenance con be somewhat eased by requiring that every
driver working on top of the new abstraction has to have a stable
Mesa-CI setup (be it lava or ci-tron, or whatever) but all of those
things need to be reasoned about. In my experience abstractions never
have uniform support because some people will value cons of them more
than they value the pros. So the entire process requires some very
steadfast individuals to keep going despite hearing that the effort is
dumb, at least until the benefits of the new approach are impossible
to deny. So you know... "how much do you believe in this approach
because some days will suck and you can't give up" ;) is probably the
question.

z


Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-24 Thread Faith Ekstrand
Jose,

Thanks for your thoughts!

On Wed, Jan 24, 2024 at 4:30 AM Jose Fonseca  wrote:
>
> I don't know much about the current Vulkan driver internals to have or 
> provide an informed opinion on the path forward, but I'd like to share my 
> backwards looking perspective.
>
> Looking back, Gallium was two things effectively:
> (1) an abstraction layer, that's watertight (as in upper layers shouldn't 
> reach through to lower layers)
> (2) an ecosystem of reusable components (draw, util, tgsi, etc.)
>
> (1) was of course important -- and the discipline it imposed is what enabled 
> to great simplifications -- but it also became a straight-jacket, as GPUs 
> didn't stand still, and sooner or later the see-every-hardware-as-the-same 
> lenses stop reflecting reality.
>
> If I had to pick one, I'd say that (2) is far more useful and practical.
> Take components like gallium's draw and other util modules. A driver can 
> choose to use them or not.  One could fork them within Mesa source tree, and 
> only the drivers that opt-in into the fork would need to be tested/adapted/etc
>
> On the flip side, Vulkan API is already a pretty low level HW abstraction.  
> It's also very flexible and extensible, so it's hard to provide a watertight 
> abstraction underneath it without either taking the lowest common 
> denominator, or having lots of optional bits of functionality governed by a 
> myriad of caps like you alluded to.

There is a third thing that isn't really recognized in your description:

(3) A common "language" to talk about GPUs and data structures that
represent that language

This is precisely what the Vulkan runtime today doesn't have. Classic
meta sucked because we were trying to implement GL in GL. u_blitter,
on the other hand, is pretty fantastic because Gallium provides a much
more sane interface to write those common components in terms of.

So far, we've been trying to build those components in terms of the
Vulkan API itself with calls jumping back into the dispatch table to
try and get inside the driver. This is working but it's getting more
and more fragile the more tools we add to that box. A lot of what I
want to do with gallium2 or whatever we're calling it is to fix our
layering problems so that calls go in one direction and we can
untangle the jumble. I'm still not sure what I want that to look like
but I think I want it to look a lot like Vulkan, just with a handier
interface.

~Faith

> Not sure how useful this is in practice to you, but the lesson from my POV is 
> that opt-in reusable and shared libraries are always time well spent as they 
> can bend and adapt with the times, whereas no opt-out watertight abstractions 
> inherently have a shelf life.
>
> Jose
>
> On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand  wrote:
>>
>> Yeah, this one's gonna hit Phoronix...
>>
>> When we started writing Vulkan drivers back in the day, there was this
>> notion that Vulkan was a low-level API that directly targets hardware.
>> Vulkan drivers were these super thin things that just blasted packets
>> straight into the hardware. What little code was common was small and
>> pretty easy to just copy+paste around. It was a nice thought...
>>
>> What's happened in the intervening 8 years is that Vulkan has grown. A lot.
>>
>> We already have several places where we're doing significant layering.
>> It started with sharing the WSI code and some Python for generating
>> dispatch tables. Later we added common synchronization code and a few
>> vkFoo2 wrappers. Then render passes and...
>>
>> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024
>>
>> That's been my project the last couple weeks: A common VkPipeline
>> implementation built on top of an ESO-like interface. The big
>> deviation this MR makes from prior art is that I make no attempt at
>> pretending it's a layered implementation. The vtable for shader
>> objects looks like ESO but takes its own path when it's useful to do
>> so. For instance, shader creation always consumes NIR and a handful of
>> lowering passes are run for you. It's no st_glsl_to_nir but it is a
>> bit opinionated. Also, a few of the bits that are missing from ESO
>> such as robustness have been added to the interface.
>>
>> In my mind, this marks a pretty fundamental shift in how the Vulkan
>> runtime works, at least in my mind. Previously, everything was
>> designed to be a toolbox where you can kind of pick and choose what
>> you want to use. Also, everything at least tried to act like a layer
>> where you still implemented Vulkan but you could leave out bits like
>> render passes if you implemented the new thing and were okay with the
>> layer. With the ESO code, you implement something that isn't Vulkan
>> entrypoints and the actual entrypoints live in the runtime. This lets
>> us expand and adjust the interface as needed for our purposes as well
>> as sanitize certain things even in the modern API.
>>
>> The result is that NVK is starting to feel like a 

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-24 Thread Jose Fonseca
I don't know much about the current Vulkan driver internals to have or
provide an informed opinion on the path forward, but I'd like to share my
backwards looking perspective.

Looking back, Gallium was two things effectively:
(1) an abstraction layer, that's watertight (as in upper layers shouldn't
reach through to lower layers)
(2) an ecosystem of reusable components (draw, util, tgsi, etc.)

(1) was of course important -- and the discipline it imposed is what
enabled to great simplifications -- but it also became a straight-jacket,
as GPUs didn't stand still, and sooner or later the
see-every-hardware-as-the-same lenses stop reflecting reality.

If I had to pick one, I'd say that (2) is far more useful and practical.
Take components like gallium's draw and other util modules. A driver can
choose to use them or not.  One could fork them within Mesa source tree,
and only the drivers that opt-in into the fork would need to be
tested/adapted/etc

On the flip side, Vulkan API is already a pretty low level HW abstraction.
It's also very flexible and extensible, so it's hard to provide a
watertight abstraction underneath it without either taking the lowest
common denominator, or having lots of optional bits of functionality
governed by a myriad of caps like you alluded to.

Not sure how useful this is in practice to you, but the lesson from my POV
is that *opt-in* reusable and shared libraries are always time well spent
as they can bend and adapt with the times, whereas *no opt-out* watertight
abstractions inherently have a shelf life.

Jose

On Fri, Jan 19, 2024 at 5:30 PM Faith Ekstrand  wrote:

> Yeah, this one's gonna hit Phoronix...
>
> When we started writing Vulkan drivers back in the day, there was this
> notion that Vulkan was a low-level API that directly targets hardware.
> Vulkan drivers were these super thin things that just blasted packets
> straight into the hardware. What little code was common was small and
> pretty easy to just copy+paste around. It was a nice thought...
>
> What's happened in the intervening 8 years is that Vulkan has grown. A lot.
>
> We already have several places where we're doing significant layering.
> It started with sharing the WSI code and some Python for generating
> dispatch tables. Later we added common synchronization code and a few
> vkFoo2 wrappers. Then render passes and...
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024
>
> That's been my project the last couple weeks: A common VkPipeline
> implementation built on top of an ESO-like interface. The big
> deviation this MR makes from prior art is that I make no attempt at
> pretending it's a layered implementation. The vtable for shader
> objects looks like ESO but takes its own path when it's useful to do
> so. For instance, shader creation always consumes NIR and a handful of
> lowering passes are run for you. It's no st_glsl_to_nir but it is a
> bit opinionated. Also, a few of the bits that are missing from ESO
> such as robustness have been added to the interface.
>
> In my mind, this marks a pretty fundamental shift in how the Vulkan
> runtime works, at least in my mind. Previously, everything was
> designed to be a toolbox where you can kind of pick and choose what
> you want to use. Also, everything at least tried to act like a layer
> where you still implemented Vulkan but you could leave out bits like
> render passes if you implemented the new thing and were okay with the
> layer. With the ESO code, you implement something that isn't Vulkan
> entrypoints and the actual entrypoints live in the runtime. This lets
> us expand and adjust the interface as needed for our purposes as well
> as sanitize certain things even in the modern API.
>
> The result is that NVK is starting to feel like a gallium driver. 
>
> So here's the question: do we like this? Do we want to push in this
> direction? Should we start making more things work more this way? I'm
> not looking for MRs just yet nor do I have more reworks directly
> planned. I'm more looking for thoughts and opinions as to how the
> various Vulkan driver teams feel about this. We'll leave the detailed
> planning for the Mesa issue tracker.
>
> It's worth noting that, even though I said we've tried to keep things
> layerish, there are other parts of the runtime that look like this.
> The synchronization code is a good example. The vk_sync interface is
> pretty significantly different from the Vulkan objects it's used to
> implement. That's worked out pretty well, IMO. With as complicated as
> something like pipelines or synchronization are, trying to keep the
> illusion of a layer just isn't practical.
>
> So, do we like this? Should we be pushing more towards drivers being a
> backed of the runtime instead of a user of it?
>
> Now, before anyone asks, no, I don't really want to build a multi-API
> abstraction with a Vulkan state tracker. If we were doing this 5 years
> ago and Zink didn't already exist, one might be able to make an
> 

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-22 Thread Faith Ekstrand
On Mon, Jan 22, 2024 at 7:20 AM Iago Toral  wrote:
>
> Hi Faith,
>
> thanks for starting the discussion, we had a bit of an internal chat
> here at Igalia to see where we all stand on this and I am sharing some
> initial thoughts/questions below:
>
> El vie, 19-01-2024 a las 11:01 -0600, Faith Ekstrand escribió:
>
> > Thoughts?
>
> We think it is fine if the Vulkan runtime implements its own internal
> API that doesn't match Vulkan's. If we are going down this path however
> we really want to make sure we have good documentation for it so it is
> clear how all that works without having to figure things out by looking
> at the code.

That's a reasonable request. We probably won't re-type the Vulkan spec
in comments but having differences documented is reasonable.  I'm
thinking the level of documentation in vk_graphics_state.

> For existing drivers we think it is a bit less clear whether the effort
> required to port is going to be worth it. If you end up having to throw
> away a lot of what you currently have that already works and in some
> cases might even be optimal for your platform it may be a hard ask.
> What are your thoughts on this? How much adoption would you be looking
> for from existing drivers?

That's a good question. One of the problems I'm already seeing is that
we have a bunch of common stuff which is in use in some drivers and
not in others and I generally don't know why. If there's something
problematic about it on some vendor's hardware, we should fix that. If
it's just that driver teams don't have the time for refactors, that's
a different issue. Unfortunately, I usually don't know besides one-off
comments from a developer here and there.

And, yeah, I know it can be a lot of work.  Hopefully the work pays
off in the long run but short-term it's often hard to justify. :-/

> As new features are added to the runtime, we understand some of them
> could have dependencies on other features, building on top of them,
> requiring drivers to adopt more of the common vulkan runtime to
> continue benefiting from additional features, is that how you see this
> or would you still expect many runtime features to still be independent
> from each other to facilitate driver opt-in on a need-by-need basis?

At a feature level, yes. However, one of the big things I'm struggling
with right now is layering issues where we really need to flip things
around from the driver calling into the runtime to the runtime calling
into the driver. One of the things I would LOVE to put in the runtime
is YCbCr emulation for drivers that don't natively have multi-plane
image support. However, that just isn't possible today thanks to the
way things are layered. In particular, we would need the runtime to be
able to make one `VkImage` contain multiple driver images and that's
just not possible as long as the driver is controlling image creation.
We also don't have enough visibility into descriptor sets. People have
also talked about trying to do a common ray-tracing implementation.
Unfortunately, I just don't see that happening with the current layer
model.

Unfortunately, I don't have a lot of examples of what that would look
like without having written the code to do it. One thing I'm currently
thinking about is switching more objects to a kernel vtable model like
I did with `vk_pipeline` and `vk_shader` in the posted MR. This puts
the runtime in control of the object's life cycle and more easily
allows for multiple implementations of an object type. Like right now
you can use the common implementation for graphics and compute and
roll your own vk_pipeline for ray-tracing. I realize that doesn't
really apply to Raspberry Pi but it's an example of what flipping the
layering around looks like.

The other thing I've been realizing as I've been thinking about this
over the week-end is that, if this happens, we're likely heading
towards another gallium/classic split for a while. (Though hopefully
without the bad blood in the community that we had from gallium.) If
this plays out similarly to gallium/classic, a bunch of drivers will
remain classic, doing most things themselves and the new thing (which
really needs a name, BTW) will be driven by a small subset of drivers
and then other drivers get moved over as time allows. This isn't
necessarily a bad thing, it's just a recognition of how large-scale
changes tend to roll out within Mesa and the potential scope of a more
invasive runtime project.

Thinking of it this way would also give more freedom to the people
building the new thing to just build it without worrying about driver
porting and trying to do everything incrementally. If we do attempt
this, it needs to be done with a subset of drivers that is as
representative of the industry as possible so we don't screw anybody
over. I'm currently thinking NVK (1.3, all the features), AGX (all the
features but on shit hardware), and Panvk (low features). That won't
guarantee the perfect design for everyone, of course, but 

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-22 Thread Iago Toral
Hi Faith,

thanks for starting the discussion, we had a bit of an internal chat
here at Igalia to see where we all stand on this and I am sharing some
initial thoughts/questions below:

El vie, 19-01-2024 a las 11:01 -0600, Faith Ekstrand escribió:
> Yeah, this one's gonna hit Phoronix...
> 
> When we started writing Vulkan drivers back in the day, there was
> this
> notion that Vulkan was a low-level API that directly targets
> hardware.
> Vulkan drivers were these super thin things that just blasted packets
> straight into the hardware. What little code was common was small and
> pretty easy to just copy+paste around. It was a nice thought...
> 
> What's happened in the intervening 8 years is that Vulkan has grown.
> A lot.
> 
> We already have several places where we're doing significant
> layering.
> It started with sharing the WSI code and some Python for generating
> dispatch tables. Later we added common synchronization code and a few
> vkFoo2 wrappers. Then render passes and...
> 
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024
> 
> That's been my project the last couple weeks: A common VkPipeline
> implementation built on top of an ESO-like interface. The big
> deviation this MR makes from prior art is that I make no attempt at
> pretending it's a layered implementation. The vtable for shader
> objects looks like ESO but takes its own path when it's useful to do
> so. For instance, shader creation always consumes NIR and a handful
> of
> lowering passes are run for you. It's no st_glsl_to_nir but it is a
> bit opinionated. Also, a few of the bits that are missing from ESO
> such as robustness have been added to the interface.
> 
> In my mind, this marks a pretty fundamental shift in how the Vulkan
> runtime works, at least in my mind. Previously, everything was
> designed to be a toolbox where you can kind of pick and choose what
> you want to use. Also, everything at least tried to act like a layer
> where you still implemented Vulkan but you could leave out bits like
> render passes if you implemented the new thing and were okay with the
> layer. With the ESO code, you implement something that isn't Vulkan
> entrypoints and the actual entrypoints live in the runtime. This lets
> us expand and adjust the interface as needed for our purposes as well
> as sanitize certain things even in the modern API.
> 
> The result is that NVK is starting to feel like a gallium driver. 
> 
> So here's the question: do we like this? Do we want to push in this
> direction? Should we start making more things work more this way? I'm
> not looking for MRs just yet nor do I have more reworks directly
> planned. I'm more looking for thoughts and opinions as to how the
> various Vulkan driver teams feel about this. We'll leave the detailed
> planning for the Mesa issue tracker.
> 
> It's worth noting that, even though I said we've tried to keep things
> layerish, there are other parts of the runtime that look like this.
> The synchronization code is a good example. The vk_sync interface is
> pretty significantly different from the Vulkan objects it's used to
> implement. That's worked out pretty well, IMO. With as complicated as
> something like pipelines or synchronization are, trying to keep the
> illusion of a layer just isn't practical.
> 
> So, do we like this? Should we be pushing more towards drivers being
> a
> backed of the runtime instead of a user of it?
> 
> Now, before anyone asks, no, I don't really want to build a multi-API
> abstraction with a Vulkan state tracker. If we were doing this 5
> years
> ago and Zink didn't already exist, one might be able to make an
> argument for pushing in that direction. However, that would add a
> huge
> amount of weight to the project and make it even harder to develop
> the
> runtime than it already is and for little benefit at this point.
> 
> Here's a few other constraints on what I'm thinking:
> 
> 1. I want it to still be possible for drivers to implement an
> extension without piles of runtime plumbing or even bypass the
> runtime
> on occasion as needed.
> 
> 2. I don't want to recreate the gallium cap disaster drivers should
> know exactly what they're advertising. We may want to have some
> internal features or properties that are used by the runtime to make
> decisions but they'll be in addition to the features and properties
> in
> Vulkan.
> 
> 3. We've got some meta stuff already but we probably want more.
> However, I don't want to force meta on folks who don't want it.
> 
> The big thing here is that if we do this, I'm going to need help. I'm
> happy to do a lot of the architectural work but drivers are going to
> have to keep up with the changes and I can't take on the burden of
> moving 8 different drivers forward. I can answer questions and maybe
> help out a bit but the refactoring is going to be too much for one
> person, even if that person is me.
> 
> Thoughts?

We think it is fine if the Vulkan runtime implements its own internal

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-20 Thread Triang3l

Hello Faith and everyfrogy!

I've been developing a new Vulkan driver for Mesa — Terakan, for AMD
TeraScale Evergreen and Northern Islands GPUs — since May of 2023. You can
find it in amd/terascale/vulkan on the Terakan branch of my fork at
Triang3l/mesa. While it currently lacks many of the graphical features, the
architecture of state management, meta, and descriptors, has already
largely been implemented in its code. I'm overall relatively new to Mesa,
in the past having contributed the fragment shader interlock implementation
to RADV that included working with the state management, but never having
written a Gallium driver, or a Vulkan driver in the ANV copy-pasting era,
so this may be a somewhat fresh — although quite conservative — take on
this.

Due to various hardware and kernel driver differences (bindings being
individually loaded into fixed slots as part of the command buffer state,
the lack of command buffer chaining in the kernel resulting in having to
reapply all of the state when the size of the hardware command buffer
exceeds the HW/KMD limits), I've been designing the architecture of my
Vulkan driver largely from scratch, without using the existing Mesa drivers
as a reference.

Unfortunately, it seems like we ended up going in fundamentally opposite
directions in our designs, so I'd say that I'm much more scared about this
approach than I am excited about it.

My primary concerns about this architecture can be summarized into two
categories:

• The obligation to manage pipeline and dynamic state in the common
  representation — essentially mostly the same Vulkan function call
  arguments, but with an additional layer for processing pNext and merging
  pipeline and dynamic state — restricts the abilities of drivers to
  optimize state management for specific hardware. Most importantly, it
  hampers precompiling of state in pipeline objects.
  In state management, this would make Mesa Vulkan implementations closer
  not even to Gallium, but to the dreaded OpenGL.

• Certain parts of the common code are designed around assumptions about
  the majority of the hardware, however some devices may have large
  architectural differences in specific areas, and trying to adapt the way
  of programming such hardware subsystems results in having to write
  suboptimal algorithms, as well as sometimes artificially restricting the
  VkPhysicalDeviceLimits the device can report.
  An example from my driver is the meaning of a pipeline layout on
  fixed-slot TeraScale. Because it uses flat binding indices throughout all
  sets (sets don't exist in the hardware at all), it needs base offsets for
  each set within the stage's bindings — which are precomputed at pipeline
  layout creation. This is fundamentally incompatible with MR !27024's
  direction to remove the concept of a pipeline layout — and if the common
  vkCmdBindDescriptorSets makes the VK_KHR_maintenance6 layout-object-less
  path the only available one, it would add a lot of overhead by making it
  necessary to recompute the offsets at every bind.


I think what we need to consider about pipeline state (in the broader
sense, including both state objects and dynamic state) is that it
inherently has very different properties from anything the common runtime
already covers. What most of the current objects in the common runtime have
in common is that they:

• Are largely hardware-independent and can work everywhere the same way.
• Either:
  • Provide a complex solution to a large-scale problem, essentially being
    sort of advanced "middleware". Examples are WSI, synchronization,
    pipeline cache, secondary command buffer emulation, render pass
    emulation.
  • Or, solve a trivial task in a way that's non-intrusive towards
    algorithms employed by the drivers — such as managing object handles,
    invoking allocators, reference-counting descriptor set and pipeline
    layouts, pooling VkCommandBuffer instances.
• Rarely influence the design of "hot path" functions, such as changes to
  pipeline state and bindings.

On the other hand, pipeline state:

1. Is entirely hardware-specific.
2. Is modified very frequently — making up the majority of command buffer
   recording time.
3. Can be precompiled in pipeline objects — and that's highly desirable due
   to the previous point.

Because of 1, there's almost nothing in the pipeline state that the common
runtime can help share between drivers. Yes, it can potentially be used to
automate running some NIR passes for baking static state into shaders, but
currently it looks like the runtime is going in a somewhat different
direction, and that needs only some helper functions invoked at pipeline
creation time. Aside from that, I can't see it being able to be useful for
anything other than merging static and dynamic state into a single
structure. For drivers where developers would prefer this approach for
various reasons (prototyping simplicity, or staying at the
near-original-Vulkan level of 

Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-19 Thread X512

On 2024/01/20 2:01, Faith Ekstrand wrote:

We already have several places where we're doing significant layering.
It started with sharing the WSI code


I wish that it be possible to compile WSI implementation as separate 
Vulkan layer *.so module instead of hardcoding and duplicating it to 
each driver. It will make Vulkan driver more platform-independent and 
provide proper separation.


Re: Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-19 Thread Mike Blumenkrantz
On one hand I think it's a great idea. Moving code out of drivers to common
means fixing bugs helps everyone, and implementing new features is the same.

On the other hand, everyone's already got code that works, which means both
a lot of work to switch that code over to common and then the usual cycle
of fixing regressions.

Gallium is generally pretty great now, so my gut says a 'Zirkonium' common
layer is eventually gonna be pretty good too, assuming it can provide the
same sorts of efficiency gains; vulkan is a lot thinner than GL, which
means CPU utilization becomes more noticable very easily.

I'm not saying I'll dive in head first tomorrow, but generally speaking I
think 10 years from now it'll be a nice thing to have.


Mike

On Fri, Jan 19, 2024, 4:02 PM Faith Ekstrand  wrote:

> Yeah, this one's gonna hit Phoronix...
>
> When we started writing Vulkan drivers back in the day, there was this
> notion that Vulkan was a low-level API that directly targets hardware.
> Vulkan drivers were these super thin things that just blasted packets
> straight into the hardware. What little code was common was small and
> pretty easy to just copy+paste around. It was a nice thought...
>
> What's happened in the intervening 8 years is that Vulkan has grown. A lot.
>
> We already have several places where we're doing significant layering.
> It started with sharing the WSI code and some Python for generating
> dispatch tables. Later we added common synchronization code and a few
> vkFoo2 wrappers. Then render passes and...
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024
>
> That's been my project the last couple weeks: A common VkPipeline
> implementation built on top of an ESO-like interface. The big
> deviation this MR makes from prior art is that I make no attempt at
> pretending it's a layered implementation. The vtable for shader
> objects looks like ESO but takes its own path when it's useful to do
> so. For instance, shader creation always consumes NIR and a handful of
> lowering passes are run for you. It's no st_glsl_to_nir but it is a
> bit opinionated. Also, a few of the bits that are missing from ESO
> such as robustness have been added to the interface.
>
> In my mind, this marks a pretty fundamental shift in how the Vulkan
> runtime works, at least in my mind. Previously, everything was
> designed to be a toolbox where you can kind of pick and choose what
> you want to use. Also, everything at least tried to act like a layer
> where you still implemented Vulkan but you could leave out bits like
> render passes if you implemented the new thing and were okay with the
> layer. With the ESO code, you implement something that isn't Vulkan
> entrypoints and the actual entrypoints live in the runtime. This lets
> us expand and adjust the interface as needed for our purposes as well
> as sanitize certain things even in the modern API.
>
> The result is that NVK is starting to feel like a gallium driver. 
>
> So here's the question: do we like this? Do we want to push in this
> direction? Should we start making more things work more this way? I'm
> not looking for MRs just yet nor do I have more reworks directly
> planned. I'm more looking for thoughts and opinions as to how the
> various Vulkan driver teams feel about this. We'll leave the detailed
> planning for the Mesa issue tracker.
>
> It's worth noting that, even though I said we've tried to keep things
> layerish, there are other parts of the runtime that look like this.
> The synchronization code is a good example. The vk_sync interface is
> pretty significantly different from the Vulkan objects it's used to
> implement. That's worked out pretty well, IMO. With as complicated as
> something like pipelines or synchronization are, trying to keep the
> illusion of a layer just isn't practical.
>
> So, do we like this? Should we be pushing more towards drivers being a
> backed of the runtime instead of a user of it?
>
> Now, before anyone asks, no, I don't really want to build a multi-API
> abstraction with a Vulkan state tracker. If we were doing this 5 years
> ago and Zink didn't already exist, one might be able to make an
> argument for pushing in that direction. However, that would add a huge
> amount of weight to the project and make it even harder to develop the
> runtime than it already is and for little benefit at this point.
>
> Here's a few other constraints on what I'm thinking:
>
> 1. I want it to still be possible for drivers to implement an
> extension without piles of runtime plumbing or even bypass the runtime
> on occasion as needed.
>
> 2. I don't want to recreate the gallium cap disaster drivers should
> know exactly what they're advertising. We may want to have some
> internal features or properties that are used by the runtime to make
> decisions but they'll be in addition to the features and properties in
> Vulkan.
>
> 3. We've got some meta stuff already but we probably want more.
> However, I don't want to 

Future direction of the Mesa Vulkan runtime (or "should we build a new gallium?")

2024-01-19 Thread Faith Ekstrand
Yeah, this one's gonna hit Phoronix...

When we started writing Vulkan drivers back in the day, there was this
notion that Vulkan was a low-level API that directly targets hardware.
Vulkan drivers were these super thin things that just blasted packets
straight into the hardware. What little code was common was small and
pretty easy to just copy+paste around. It was a nice thought...

What's happened in the intervening 8 years is that Vulkan has grown. A lot.

We already have several places where we're doing significant layering.
It started with sharing the WSI code and some Python for generating
dispatch tables. Later we added common synchronization code and a few
vkFoo2 wrappers. Then render passes and...

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024

That's been my project the last couple weeks: A common VkPipeline
implementation built on top of an ESO-like interface. The big
deviation this MR makes from prior art is that I make no attempt at
pretending it's a layered implementation. The vtable for shader
objects looks like ESO but takes its own path when it's useful to do
so. For instance, shader creation always consumes NIR and a handful of
lowering passes are run for you. It's no st_glsl_to_nir but it is a
bit opinionated. Also, a few of the bits that are missing from ESO
such as robustness have been added to the interface.

In my mind, this marks a pretty fundamental shift in how the Vulkan
runtime works, at least in my mind. Previously, everything was
designed to be a toolbox where you can kind of pick and choose what
you want to use. Also, everything at least tried to act like a layer
where you still implemented Vulkan but you could leave out bits like
render passes if you implemented the new thing and were okay with the
layer. With the ESO code, you implement something that isn't Vulkan
entrypoints and the actual entrypoints live in the runtime. This lets
us expand and adjust the interface as needed for our purposes as well
as sanitize certain things even in the modern API.

The result is that NVK is starting to feel like a gallium driver. 

So here's the question: do we like this? Do we want to push in this
direction? Should we start making more things work more this way? I'm
not looking for MRs just yet nor do I have more reworks directly
planned. I'm more looking for thoughts and opinions as to how the
various Vulkan driver teams feel about this. We'll leave the detailed
planning for the Mesa issue tracker.

It's worth noting that, even though I said we've tried to keep things
layerish, there are other parts of the runtime that look like this.
The synchronization code is a good example. The vk_sync interface is
pretty significantly different from the Vulkan objects it's used to
implement. That's worked out pretty well, IMO. With as complicated as
something like pipelines or synchronization are, trying to keep the
illusion of a layer just isn't practical.

So, do we like this? Should we be pushing more towards drivers being a
backed of the runtime instead of a user of it?

Now, before anyone asks, no, I don't really want to build a multi-API
abstraction with a Vulkan state tracker. If we were doing this 5 years
ago and Zink didn't already exist, one might be able to make an
argument for pushing in that direction. However, that would add a huge
amount of weight to the project and make it even harder to develop the
runtime than it already is and for little benefit at this point.

Here's a few other constraints on what I'm thinking:

1. I want it to still be possible for drivers to implement an
extension without piles of runtime plumbing or even bypass the runtime
on occasion as needed.

2. I don't want to recreate the gallium cap disaster drivers should
know exactly what they're advertising. We may want to have some
internal features or properties that are used by the runtime to make
decisions but they'll be in addition to the features and properties in
Vulkan.

3. We've got some meta stuff already but we probably want more.
However, I don't want to force meta on folks who don't want it.

The big thing here is that if we do this, I'm going to need help. I'm
happy to do a lot of the architectural work but drivers are going to
have to keep up with the changes and I can't take on the burden of
moving 8 different drivers forward. I can answer questions and maybe
help out a bit but the refactoring is going to be too much for one
person, even if that person is me.

Thoughts?

~Faith