Re: [RFC] Implicit vs explicit user fence sync

2021-05-12 Thread Daniel Vetter
On Wed, May 12, 2021 at 10:23:04AM +0200, Christian König wrote:
> Am 12.05.21 um 10:13 schrieb Daniel Vetter:
> > On Tue, May 11, 2021 at 09:34:11PM +0200, Christian König wrote:
> > > Am 11.05.21 um 18:48 schrieb Daniel Vetter:
> > > > [SNIP]
> > > > > Why?
> > > > If you allow implicit fencing then you can end up with
> > > > - an implicit userspace fence as the in-fence
> > > > - but an explicit dma_fence as the out fence
> > > > 
> > > > Which is not allowed. So there's really no way to make this work, except
> > > > if you stall in the ioctl, which also doesn't work.
> > > Ok, wait a second. I really don't understand what's going on here.
> > > 
> > > The out fence is just to let the userspace know when the frame is 
> > > displayed.
> > > Or rather when the old frame is no longer displayed so that it can be
> > > reused, right?
> > > 
> > > Then why does that need to be a dma_fence? We don't use that for memory
> > > management anywhere, don't we?
> > It can be a sync_file, so you can queue up the next rendering targeting
> > the old backbuffer right away. On memory constraint devices where
> > triple-buffering is prohibitive this is apparently a pretty cool trick or
> > something like that.
> 
> Yeah, I'm aware of that.
> 
> But we don't really need that for device we want to interop with userspace
> queues, don't we?
> 
> > > > So you have to do an uapi change here. At that point we might as well do
> > > > it right.
> > > I mean in the worst case we might need to allow user fences with 
> > > sync_files
> > > as well when that is really used outside of Android.
> > > 
> > > But I still don't see the fundamental problem here.
> > Making sync_file use implicit is more pain, it becomes sprawling pretty
> > quickly.
> 
> Agreed, but I don't see supporting sync_file and out fences as something
> necessarily.
> 
> As far as I can see we don't need to support that burden for the use cases
> we have for implicit sync.
> 
> And even if we have to it is possible to implement.
> 
> > Anyway I think we're just turning in circles. My take is that your patch
> > series here demonstrates nothing beyond "adding function calls that do
> > nothing is easy", the real deal is in making it work. And I think it's
> > easier to make this all work with explicit userspace fencing first and
> > then worry about how we maybe make this work implicitly. Instead of what
> > you propose, which is rig up a lot of scaffolding without having any idea
> > whether it's even in the right place. That seems very backwards to me.
> 
> And that's what I disagree on.
> 
> Supporting implicit sync in the kernel for the functionality we need to
> amdgpu is relatively easily.
> 
> Change all of userspace to not rely on implicit sync any more is really hard
> compared to that.
> 
> Dropping implicit sync in userspace has a lot of advantage and should be
> pushed for, but not because it is a prerequisite of user fences.

Yeah but if we support userspace fences with implicit sync only, because
that's right now the only thing amdgpu needs, then we make really, really
sure the transition to explicit sync will never happen.

Also if the goal really is to only do the minimal thing to keep amdgpu
working on existing compositors, then imo the right solution is to just do
the kernel-augmented userspace submit I've proposed. Not this.

The kernel augmented userspace submit keeps all dma_fence stuff working
_exactly_ like it currently does, including drm/scheduler sorting
depenendencies and everything. So from a "will it work" point of view a
much more solid solution.

And it doesn't require us to create an extremely warty uapi where if you
happen to use implicit userspace fences alot of things kinda stop working,
for no reason.

So if youre overall goal is to just not touch any compositor/winsys
protocol at all, there _is_ a very clean solution imo.
-Daniel

> Regards,
> Christian.
> 
> > 
> > Plus I really don't like retro-fitting userspace fences into implicit sync
> > and sync_file and everything. But that's not the main issue I have here.
> > -Daniel
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Of course if you only care about some specific compositors (or maybe 
> > > > only
> > > > the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
> > > > cross-driver so we can't do that. Or at least I don't see a way how to 
> > > > do
> > > > this without causing endless amounts of fun down the road.
> > > > 
> > > > > > So I have a plan here, what was yours?
> > > > > As far as I see that should still work perfectly fine and I have the 
> > > > > strong
> > > > > feeling I'm missing something here.
> > > > > 
> > > > > > > Transporting fences between processes is not the fundamental 
> > > > > > > problem here,
> > > > > > > but rather the question how we represent all this in the kernel?
> > > > > > > 
> > > > > > > In other words I think what you outlined above is just 
> > > > > > > approaching it from
> > > > > > > the wrong 

Re: [RFC] Implicit vs explicit user fence sync

2021-05-12 Thread Christian König

Am 12.05.21 um 10:13 schrieb Daniel Vetter:

On Tue, May 11, 2021 at 09:34:11PM +0200, Christian König wrote:

Am 11.05.21 um 18:48 schrieb Daniel Vetter:

[SNIP]

Why?

If you allow implicit fencing then you can end up with
- an implicit userspace fence as the in-fence
- but an explicit dma_fence as the out fence

Which is not allowed. So there's really no way to make this work, except
if you stall in the ioctl, which also doesn't work.

Ok, wait a second. I really don't understand what's going on here.

The out fence is just to let the userspace know when the frame is displayed.
Or rather when the old frame is no longer displayed so that it can be
reused, right?

Then why does that need to be a dma_fence? We don't use that for memory
management anywhere, don't we?

It can be a sync_file, so you can queue up the next rendering targeting
the old backbuffer right away. On memory constraint devices where
triple-buffering is prohibitive this is apparently a pretty cool trick or
something like that.


Yeah, I'm aware of that.

But we don't really need that for device we want to interop with 
userspace queues, don't we?



So you have to do an uapi change here. At that point we might as well do
it right.

I mean in the worst case we might need to allow user fences with sync_files
as well when that is really used outside of Android.

But I still don't see the fundamental problem here.

Making sync_file use implicit is more pain, it becomes sprawling pretty
quickly.


Agreed, but I don't see supporting sync_file and out fences as something 
necessarily.


As far as I can see we don't need to support that burden for the use 
cases we have for implicit sync.


And even if we have to it is possible to implement.


Anyway I think we're just turning in circles. My take is that your patch
series here demonstrates nothing beyond "adding function calls that do
nothing is easy", the real deal is in making it work. And I think it's
easier to make this all work with explicit userspace fencing first and
then worry about how we maybe make this work implicitly. Instead of what
you propose, which is rig up a lot of scaffolding without having any idea
whether it's even in the right place. That seems very backwards to me.


And that's what I disagree on.

Supporting implicit sync in the kernel for the functionality we need to 
amdgpu is relatively easily.


Change all of userspace to not rely on implicit sync any more is really 
hard compared to that.


Dropping implicit sync in userspace has a lot of advantage and should be 
pushed for, but not because it is a prerequisite of user fences.


Regards,
Christian.



Plus I really don't like retro-fitting userspace fences into implicit sync
and sync_file and everything. But that's not the main issue I have here.
-Daniel


Regards,
Christian.


Of course if you only care about some specific compositors (or maybe only
the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
cross-driver so we can't do that. Or at least I don't see a way how to do
this without causing endless amounts of fun down the road.


So I have a plan here, what was yours?

As far as I see that should still work perfectly fine and I have the strong
feeling I'm missing something here.


Transporting fences between processes is not the fundamental problem here,
but rather the question how we represent all this in the kernel?

In other words I think what you outlined above is just approaching it from
the wrong side again. Instead of looking what the kernel needs to support
this you take a look at userspace and the requirements there.

Uh ... that was my idea here? That's why I put "build userspace fences in
userspace only" as the very first thing. Then extend to winsys and
atomic/display and all these cases where things get more tricky.

I agree that transporting the fences is easy, which is why it's not
interesting trying to solve that problem first. Which is kinda what you're
trying to do here by adding implicit userspace fences (well not even that,
just a bunch of function calls without any semantics attached to them).

So if there's more here, you need to flesh it out more or I just dont get
what you're actually trying to demonstrate.

Well I'm trying to figure out why you see it as such a problem to keep
implicit sync around.

As far as I can tell it is completely octagonal if we use implicit/explicit
and dma_fence/user_fence.

It's just a different implementation inside the kernel.

See above. It falls apart with the atomic ioctl.
-Daniel




Re: [RFC] Implicit vs explicit user fence sync

2021-05-12 Thread Daniel Vetter
On Tue, May 11, 2021 at 09:34:11PM +0200, Christian König wrote:
> Am 11.05.21 um 18:48 schrieb Daniel Vetter:
> > [SNIP]
> > > Why?
> > If you allow implicit fencing then you can end up with
> > - an implicit userspace fence as the in-fence
> > - but an explicit dma_fence as the out fence
> > 
> > Which is not allowed. So there's really no way to make this work, except
> > if you stall in the ioctl, which also doesn't work.
> 
> Ok, wait a second. I really don't understand what's going on here.
> 
> The out fence is just to let the userspace know when the frame is displayed.
> Or rather when the old frame is no longer displayed so that it can be
> reused, right?
> 
> Then why does that need to be a dma_fence? We don't use that for memory
> management anywhere, don't we?

It can be a sync_file, so you can queue up the next rendering targeting
the old backbuffer right away. On memory constraint devices where
triple-buffering is prohibitive this is apparently a pretty cool trick or
something like that.

> > So you have to do an uapi change here. At that point we might as well do
> > it right.
> 
> I mean in the worst case we might need to allow user fences with sync_files
> as well when that is really used outside of Android.
> 
> But I still don't see the fundamental problem here.

Making sync_file use implicit is more pain, it becomes sprawling pretty
quickly.

Anyway I think we're just turning in circles. My take is that your patch
series here demonstrates nothing beyond "adding function calls that do
nothing is easy", the real deal is in making it work. And I think it's
easier to make this all work with explicit userspace fencing first and
then worry about how we maybe make this work implicitly. Instead of what
you propose, which is rig up a lot of scaffolding without having any idea
whether it's even in the right place. That seems very backwards to me.

Plus I really don't like retro-fitting userspace fences into implicit sync
and sync_file and everything. But that's not the main issue I have here.
-Daniel

> 
> Regards,
> Christian.
> 
> > Of course if you only care about some specific compositors (or maybe only
> > the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
> > cross-driver so we can't do that. Or at least I don't see a way how to do
> > this without causing endless amounts of fun down the road.
> > 
> > > > So I have a plan here, what was yours?
> > > As far as I see that should still work perfectly fine and I have the 
> > > strong
> > > feeling I'm missing something here.
> > > 
> > > > > Transporting fences between processes is not the fundamental problem 
> > > > > here,
> > > > > but rather the question how we represent all this in the kernel?
> > > > > 
> > > > > In other words I think what you outlined above is just approaching it 
> > > > > from
> > > > > the wrong side again. Instead of looking what the kernel needs to 
> > > > > support
> > > > > this you take a look at userspace and the requirements there.
> > > > Uh ... that was my idea here? That's why I put "build userspace fences 
> > > > in
> > > > userspace only" as the very first thing. Then extend to winsys and
> > > > atomic/display and all these cases where things get more tricky.
> > > > 
> > > > I agree that transporting the fences is easy, which is why it's not
> > > > interesting trying to solve that problem first. Which is kinda what 
> > > > you're
> > > > trying to do here by adding implicit userspace fences (well not even 
> > > > that,
> > > > just a bunch of function calls without any semantics attached to them).
> > > > 
> > > > So if there's more here, you need to flesh it out more or I just dont 
> > > > get
> > > > what you're actually trying to demonstrate.
> > > Well I'm trying to figure out why you see it as such a problem to keep
> > > implicit sync around.
> > > 
> > > As far as I can tell it is completely octagonal if we use 
> > > implicit/explicit
> > > and dma_fence/user_fence.
> > > 
> > > It's just a different implementation inside the kernel.
> > See above. It falls apart with the atomic ioctl.
> > -Daniel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [RFC] Implicit vs explicit user fence sync

2021-05-11 Thread Christian König

Am 11.05.21 um 18:48 schrieb Daniel Vetter:

[SNIP]

Why?

If you allow implicit fencing then you can end up with
- an implicit userspace fence as the in-fence
- but an explicit dma_fence as the out fence

Which is not allowed. So there's really no way to make this work, except
if you stall in the ioctl, which also doesn't work.


Ok, wait a second. I really don't understand what's going on here.

The out fence is just to let the userspace know when the frame is 
displayed. Or rather when the old frame is no longer displayed so that 
it can be reused, right?


Then why does that need to be a dma_fence? We don't use that for memory 
management anywhere, don't we?



So you have to do an uapi change here. At that point we might as well do
it right.


I mean in the worst case we might need to allow user fences with 
sync_files as well when that is really used outside of Android.


But I still don't see the fundamental problem here.

Regards,
Christian.


Of course if you only care about some specific compositors (or maybe only
the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
cross-driver so we can't do that. Or at least I don't see a way how to do
this without causing endless amounts of fun down the road.


So I have a plan here, what was yours?

As far as I see that should still work perfectly fine and I have the strong
feeling I'm missing something here.


Transporting fences between processes is not the fundamental problem here,
but rather the question how we represent all this in the kernel?

In other words I think what you outlined above is just approaching it from
the wrong side again. Instead of looking what the kernel needs to support
this you take a look at userspace and the requirements there.

Uh ... that was my idea here? That's why I put "build userspace fences in
userspace only" as the very first thing. Then extend to winsys and
atomic/display and all these cases where things get more tricky.

I agree that transporting the fences is easy, which is why it's not
interesting trying to solve that problem first. Which is kinda what you're
trying to do here by adding implicit userspace fences (well not even that,
just a bunch of function calls without any semantics attached to them).

So if there's more here, you need to flesh it out more or I just dont get
what you're actually trying to demonstrate.

Well I'm trying to figure out why you see it as such a problem to keep
implicit sync around.

As far as I can tell it is completely octagonal if we use implicit/explicit
and dma_fence/user_fence.

It's just a different implementation inside the kernel.

See above. It falls apart with the atomic ioctl.
-Daniel


Re: [RFC] Implicit vs explicit user fence sync

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 05:32:29PM +0200, Christian König wrote:
> Am 11.05.21 um 16:23 schrieb Daniel Vetter:
> > On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote:
> > > Am 11.05.21 um 09:31 schrieb Daniel Vetter:
> > > > [SNIP]
> > > > > > And that's just the one ioctl I know is big trouble, I'm sure we'll 
> > > > > > find
> > > > > > more funny corner cases when we roll out explicit user fencing.
> > > > > I think we can just ignore sync_file. As far as it concerns me that 
> > > > > UAPI is
> > > > > pretty much dead.
> > > > Uh that's rather bold. Android is built on it. Currently atomic kms is
> > > > built on it.
> > > To be honest I don't think we care about Android at all.
> > we = amd or we = upstream here?
> 
> we = amd, for everybody else that is certainly a different topic.
> 
> But for now AMD is the only one running into this problem.
> 
> Could be that Nouveau sees this as well with the next hw generation, but who
> knows?
> 
> > > > Why is this not much of a problem if it's just within one driver?
> > > Because inside the same driver I can easily add the waits before 
> > > submitting
> > > the MM work as necessary.
> > What is MM work here now?
> 
> MM=multimedia, e.g. UVD, VCE, VCN engines on AMD hardware.
> 
> > > > > > > Adding implicit synchronization on top of that is then rather 
> > > > > > > trivial.
> > > > > > Well that's what I disagree with, since I already see some problems 
> > > > > > that I
> > > > > > don't think we can overcome (the atomic ioctl is one). And that's 
> > > > > > with us
> > > > > > only having a fairly theoretical understanding of the overall 
> > > > > > situation.
> > > > > But how should we then ever support user fences with the atomic IOCTL?
> > > > > 
> > > > > We can't wait in user space since that will disable the support for 
> > > > > waiting
> > > > > in the hardware.
> > > > Well, figure it out :-)
> > > > 
> > > > This is exactly why I'm not seeing anything solved with just rolling a
> > > > function call to a bunch of places, because it's pretending all things 
> > > > are
> > > > solved when clearly that's not the case.
> > > > 
> > > > I really think what we need is to first figure out how to support
> > > > userspace fences as explicit entities across the stack, maybe with
> > > > something like this order:
> > > > 1. enable them purely within a single userspace driver (like vk with
> > > > winsys disabled, or something else like that except not amd because
> > > > there's this amdkfd split for "real" compute)
> > > > 1a. including atomic ioctl, e.g. for vk direct display support this can 
> > > > be
> > > > used without cross-process sharing, new winsys protocols and all that 
> > > > fun
> > > > 2. figure out how to transport these userspace fences with something 
> > > > like
> > > > drm_syncobj
> > > > 2a. figure out the compat story for drivers which dont do userspace 
> > > > fences
> > > > 2b. figure out how to absorb the overhead if the winsys/compositor 
> > > > doesn't
> > > > support explicit sync
> > > > 3. maybe figure out how to make this all happen magically with implicit
> > > > sync, if we really, really care
> > > > 
> > > > If we do 3 before we've nailed all these problems, we're just 
> > > > guaranteeing
> > > > we'll get the wrong solutions and so we'll then have 3 ways of doing
> > > > userspace fences
> > > > - the butchered implicit one that didn't quite work
> > > > - the explicit one
> > > > - the not-so-butchered implicit one with the lessons from the properly
> > > > done explicit one
> > > > 
> > > > The thing is, if you have no idea how to integrate userspace fences
> > > > explicitly into atomic ioctl, then you definitely have no idea how to do
> > > > it implicitly :-)
> > > Well I agree on that. But the question is still how would you do explicit
> > > with atomic?
> > If you supply an userpace fence (is that what we call them now) as
> > in-fence, then your only allowed to get a userspace fence as out-fence.
> 
> Yeah, that part makes perfectly sense. But I don't see the problem with
> that?
> 
> > That way we
> > - don't block anywhere we shouldn't
> > - don't create a dma_fence out of a userspace fence
> > 
> > The problem is this completely breaks your "magically make implicit
> > fencing with userspace fences" plan.
> 
> Why?

If you allow implicit fencing then you can end up with
- an implicit userspace fence as the in-fence
- but an explicit dma_fence as the out fence

Which is not allowed. So there's really no way to make this work, except
if you stall in the ioctl, which also doesn't work.

So you have to do an uapi change here. At that point we might as well do
it right.

Of course if you only care about some specific compositors (or maybe only
the -amdgpu Xorg driver even) then this isn't a concern, but atomic is
cross-driver so we can't do that. Or at least I don't see a way how to do
this without causing endless amounts of fun down the road.

> > So I have a plan here, 

Re: [RFC] Implicit vs explicit user fence sync

2021-05-11 Thread Christian König

Am 11.05.21 um 16:23 schrieb Daniel Vetter:

On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote:

Am 11.05.21 um 09:31 schrieb Daniel Vetter:

[SNIP]

And that's just the one ioctl I know is big trouble, I'm sure we'll find
more funny corner cases when we roll out explicit user fencing.

I think we can just ignore sync_file. As far as it concerns me that UAPI is
pretty much dead.

Uh that's rather bold. Android is built on it. Currently atomic kms is
built on it.

To be honest I don't think we care about Android at all.

we = amd or we = upstream here?


we = amd, for everybody else that is certainly a different topic.

But for now AMD is the only one running into this problem.

Could be that Nouveau sees this as well with the next hw generation, but 
who knows?



Why is this not much of a problem if it's just within one driver?

Because inside the same driver I can easily add the waits before submitting
the MM work as necessary.

What is MM work here now?


MM=multimedia, e.g. UVD, VCE, VCN engines on AMD hardware.


Adding implicit synchronization on top of that is then rather trivial.

Well that's what I disagree with, since I already see some problems that I
don't think we can overcome (the atomic ioctl is one). And that's with us
only having a fairly theoretical understanding of the overall situation.

But how should we then ever support user fences with the atomic IOCTL?

We can't wait in user space since that will disable the support for waiting
in the hardware.

Well, figure it out :-)

This is exactly why I'm not seeing anything solved with just rolling a
function call to a bunch of places, because it's pretending all things are
solved when clearly that's not the case.

I really think what we need is to first figure out how to support
userspace fences as explicit entities across the stack, maybe with
something like this order:
1. enable them purely within a single userspace driver (like vk with
winsys disabled, or something else like that except not amd because
there's this amdkfd split for "real" compute)
1a. including atomic ioctl, e.g. for vk direct display support this can be
used without cross-process sharing, new winsys protocols and all that fun
2. figure out how to transport these userspace fences with something like
drm_syncobj
2a. figure out the compat story for drivers which dont do userspace fences
2b. figure out how to absorb the overhead if the winsys/compositor doesn't
support explicit sync
3. maybe figure out how to make this all happen magically with implicit
sync, if we really, really care

If we do 3 before we've nailed all these problems, we're just guaranteeing
we'll get the wrong solutions and so we'll then have 3 ways of doing
userspace fences
- the butchered implicit one that didn't quite work
- the explicit one
- the not-so-butchered implicit one with the lessons from the properly
done explicit one

The thing is, if you have no idea how to integrate userspace fences
explicitly into atomic ioctl, then you definitely have no idea how to do
it implicitly :-)

Well I agree on that. But the question is still how would you do explicit
with atomic?

If you supply an userpace fence (is that what we call them now) as
in-fence, then your only allowed to get a userspace fence as out-fence.


Yeah, that part makes perfectly sense. But I don't see the problem with 
that?



That way we
- don't block anywhere we shouldn't
- don't create a dma_fence out of a userspace fence

The problem is this completely breaks your "magically make implicit
fencing with userspace fences" plan.


Why?


So I have a plan here, what was yours?


As far as I see that should still work perfectly fine and I have the 
strong feeling I'm missing something here.



Transporting fences between processes is not the fundamental problem here,
but rather the question how we represent all this in the kernel?

In other words I think what you outlined above is just approaching it from
the wrong side again. Instead of looking what the kernel needs to support
this you take a look at userspace and the requirements there.

Uh ... that was my idea here? That's why I put "build userspace fences in
userspace only" as the very first thing. Then extend to winsys and
atomic/display and all these cases where things get more tricky.

I agree that transporting the fences is easy, which is why it's not
interesting trying to solve that problem first. Which is kinda what you're
trying to do here by adding implicit userspace fences (well not even that,
just a bunch of function calls without any semantics attached to them).

So if there's more here, you need to flesh it out more or I just dont get
what you're actually trying to demonstrate.


Well I'm trying to figure out why you see it as such a problem to keep 
implicit sync around.


As far as I can tell it is completely octagonal if we use 
implicit/explicit and dma_fence/user_fence.


It's just a different implementation inside the kernel.

Christian.


-Daniel



Re: [RFC] Implicit vs explicit user fence sync

2021-05-11 Thread Daniel Vetter
On Tue, May 11, 2021 at 09:47:56AM +0200, Christian König wrote:
> Am 11.05.21 um 09:31 schrieb Daniel Vetter:
> > [SNIP]
> > > > And that's just the one ioctl I know is big trouble, I'm sure we'll find
> > > > more funny corner cases when we roll out explicit user fencing.
> > > I think we can just ignore sync_file. As far as it concerns me that UAPI 
> > > is
> > > pretty much dead.
> > Uh that's rather bold. Android is built on it. Currently atomic kms is
> > built on it.
> 
> To be honest I don't think we care about Android at all.

we = amd or we = upstream here?

> > > What we should support is drm_syncobj, but that also only as an in-fence
> > > since that's what our hardware supports.
> > Convince Android folks, minimally. Probably a lot more. Yes with hindsight
> > we should have just gone for drm_syncobj instead of the sync_file thing,
> > but hindsight and all that.
> > 
> > This is kinda why I don't think trying to support the existing uapi with
> > userspace fences underneath with some magic tricks is a good idea. It's
> > just a pile of work, plus it's not really architecturally clean.
> > 
> > > > Anotherone that looks very sketchy right now is buffer sharing between
> > > > different userspace drivers, like compute <-> media (if you have some
> > > > fancy AI pipeline in your media workload, as an example).
> > > Yeah, we are certainly going to get that. But only inside the same driver,
> > > so not much of a problem.
> > Why is this not much of a problem if it's just within one driver?
> 
> Because inside the same driver I can easily add the waits before submitting
> the MM work as necessary.

What is MM work here now?

> > > > > Adding implicit synchronization on top of that is then rather trivial.
> > > > Well that's what I disagree with, since I already see some problems 
> > > > that I
> > > > don't think we can overcome (the atomic ioctl is one). And that's with 
> > > > us
> > > > only having a fairly theoretical understanding of the overall situation.
> > > But how should we then ever support user fences with the atomic IOCTL?
> > > 
> > > We can't wait in user space since that will disable the support for 
> > > waiting
> > > in the hardware.
> > Well, figure it out :-)
> > 
> > This is exactly why I'm not seeing anything solved with just rolling a
> > function call to a bunch of places, because it's pretending all things are
> > solved when clearly that's not the case.
> > 
> > I really think what we need is to first figure out how to support
> > userspace fences as explicit entities across the stack, maybe with
> > something like this order:
> > 1. enable them purely within a single userspace driver (like vk with
> > winsys disabled, or something else like that except not amd because
> > there's this amdkfd split for "real" compute)
> > 1a. including atomic ioctl, e.g. for vk direct display support this can be
> > used without cross-process sharing, new winsys protocols and all that fun
> > 2. figure out how to transport these userspace fences with something like
> > drm_syncobj
> > 2a. figure out the compat story for drivers which dont do userspace fences
> > 2b. figure out how to absorb the overhead if the winsys/compositor doesn't
> > support explicit sync
> > 3. maybe figure out how to make this all happen magically with implicit
> > sync, if we really, really care
> > 
> > If we do 3 before we've nailed all these problems, we're just guaranteeing
> > we'll get the wrong solutions and so we'll then have 3 ways of doing
> > userspace fences
> > - the butchered implicit one that didn't quite work
> > - the explicit one
> > - the not-so-butchered implicit one with the lessons from the properly
> >done explicit one
> > 
> > The thing is, if you have no idea how to integrate userspace fences
> > explicitly into atomic ioctl, then you definitely have no idea how to do
> > it implicitly :-)
> 
> Well I agree on that. But the question is still how would you do explicit
> with atomic?

If you supply an userpace fence (is that what we call them now) as
in-fence, then your only allowed to get a userspace fence as out-fence.
That way we
- don't block anywhere we shouldn't
- don't create a dma_fence out of a userspace fence

The problem is this completely breaks your "magically make implicit
fencing with userspace fences" plan.

So I have a plan here, what was yours?

> Transporting fences between processes is not the fundamental problem here,
> but rather the question how we represent all this in the kernel?
> 
> In other words I think what you outlined above is just approaching it from
> the wrong side again. Instead of looking what the kernel needs to support
> this you take a look at userspace and the requirements there.

Uh ... that was my idea here? That's why I put "build userspace fences in
userspace only" as the very first thing. Then extend to winsys and
atomic/display and all these cases where things get more tricky.

I agree that transporting the fences is easy, which is 

Re: [RFC] Implicit vs explicit user fence sync

2021-05-11 Thread Christian König

Am 11.05.21 um 09:31 schrieb Daniel Vetter:

[SNIP]

And that's just the one ioctl I know is big trouble, I'm sure we'll find
more funny corner cases when we roll out explicit user fencing.

I think we can just ignore sync_file. As far as it concerns me that UAPI is
pretty much dead.

Uh that's rather bold. Android is built on it. Currently atomic kms is
built on it.


To be honest I don't think we care about Android at all.


What we should support is drm_syncobj, but that also only as an in-fence
since that's what our hardware supports.

Convince Android folks, minimally. Probably a lot more. Yes with hindsight
we should have just gone for drm_syncobj instead of the sync_file thing,
but hindsight and all that.

This is kinda why I don't think trying to support the existing uapi with
userspace fences underneath with some magic tricks is a good idea. It's
just a pile of work, plus it's not really architecturally clean.


Anotherone that looks very sketchy right now is buffer sharing between
different userspace drivers, like compute <-> media (if you have some
fancy AI pipeline in your media workload, as an example).

Yeah, we are certainly going to get that. But only inside the same driver,
so not much of a problem.

Why is this not much of a problem if it's just within one driver?


Because inside the same driver I can easily add the waits before 
submitting the MM work as necessary.



Adding implicit synchronization on top of that is then rather trivial.

Well that's what I disagree with, since I already see some problems that I
don't think we can overcome (the atomic ioctl is one). And that's with us
only having a fairly theoretical understanding of the overall situation.

But how should we then ever support user fences with the atomic IOCTL?

We can't wait in user space since that will disable the support for waiting
in the hardware.

Well, figure it out :-)

This is exactly why I'm not seeing anything solved with just rolling a
function call to a bunch of places, because it's pretending all things are
solved when clearly that's not the case.

I really think what we need is to first figure out how to support
userspace fences as explicit entities across the stack, maybe with
something like this order:
1. enable them purely within a single userspace driver (like vk with
winsys disabled, or something else like that except not amd because
there's this amdkfd split for "real" compute)
1a. including atomic ioctl, e.g. for vk direct display support this can be
used without cross-process sharing, new winsys protocols and all that fun
2. figure out how to transport these userspace fences with something like
drm_syncobj
2a. figure out the compat story for drivers which dont do userspace fences
2b. figure out how to absorb the overhead if the winsys/compositor doesn't
support explicit sync
3. maybe figure out how to make this all happen magically with implicit
sync, if we really, really care

If we do 3 before we've nailed all these problems, we're just guaranteeing
we'll get the wrong solutions and so we'll then have 3 ways of doing
userspace fences
- the butchered implicit one that didn't quite work
- the explicit one
- the not-so-butchered implicit one with the lessons from the properly
   done explicit one

The thing is, if you have no idea how to integrate userspace fences
explicitly into atomic ioctl, then you definitely have no idea how to do
it implicitly :-)


Well I agree on that. But the question is still how would you do 
explicit with atomic?


Transporting fences between processes is not the fundamental problem 
here, but rather the question how we represent all this in the kernel?


In other words I think what you outlined above is just approaching it 
from the wrong side again. Instead of looking what the kernel needs to 
support this you take a look at userspace and the requirements there.


Regards,
Christian.



And "just block" might be good enough for a quick demo, it still breaks
the contract. Same holds for a bunch of the winsys problems we'll have to
deal with here.
-Daniel


Regards,
Christian.


Like here at intel we have internal code for compute, and we're starting
to hit some interesting cases with interop with media already, but that's
it. Nothing even close to desktop/winsys/kms, and that's where I expect
will all the pain be at.

Cheers, Daniel




Re: [RFC] Implicit vs explicit user fence sync

2021-05-11 Thread Daniel Vetter
On Mon, May 10, 2021 at 08:12:31PM +0200, Christian König wrote:
> Am 04.05.21 um 17:11 schrieb Daniel Vetter:
> > On Tue, May 04, 2021 at 04:26:42PM +0200, Christian König wrote:
> > > Hi Daniel,
> > > 
> > > Am 04.05.21 um 16:15 schrieb Daniel Vetter:
> > > > Hi Christian,
> > > > 
> > > > On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:
> > > > > Hi guys,
> > > > > 
> > > > > with this patch set I want to look into how much more additional work 
> > > > > it
> > > > > would be to support implicit sync compared to only explicit sync.
> > > > > 
> > > > > Turned out that this is much simpler than expected since the only
> > > > > addition is that before a command submission or flip the kernel and
> > > > > classic drivers would need to wait for the user fence to signal before
> > > > > taking any locks.
> > > > It's a lot more I think
> > > > - sync_file/drm_syncobj still need to be supported somehow
> > > You need that with explicit fences as well.
> > > 
> > > I'm just concentrating on what extra burden implicit sync would get us.
> > It's not just implicit sync. Currently the best approach we have for
> > explicit sync is hiding them in drm_syncobj. Because for that all the work
> > with intentional stall points and userspace submit thread already exists.
> > 
> > None of this work has been done for sync_file. And looking at how much
> > work it was to get drm_syncobj going, that will be anything but easy.
> 
> I don't think we will want this for sync_file in the first place.
> 
> > > > - we need userspace to handle the stall in a submit thread at least
> > > > - there's nothing here that sets the sync object
> > > > - implicit sync isn't just execbuf, it's everything. E.g. the various
> > > > wait_bo ioctl also need to keep working, including timeout and
> > > > everything
> > > Good point, but that should be relatively easily to add as well.
> > > 
> > > > - we can't stall in atomic kms where you're currently stalling, that's 
> > > > for
> > > > sure. The uapi says "we're not stalling for fences in there", and 
> > > > you're
> > > > breaking that.
> > > Again as far as I can see we run into the same problem with explicit sync.
> > > 
> > > So the question is where could we block for atomic modeset for user fences
> > > in general?
> > Nah, I have an idea. But it only works if userspace is aware, because the
> > rules are essentialyl:
> > 
> > - when you supply a userspace in-fence, then you only get a userspace
> >out-fence
> > - mixing in fences between dma-fence and user fence is ok
> > - mixing out fences isn't
> > 
> > And we currently do have sync_file out fence. So it's not possible to
> > support implicit user fence in atomic in a way which doesn't break the
> > uapi somewhere.
> > 
> > Doing the explicit user fence support first will make that very obvious.
> > 
> > And that's just the one ioctl I know is big trouble, I'm sure we'll find
> > more funny corner cases when we roll out explicit user fencing.
> 
> I think we can just ignore sync_file. As far as it concerns me that UAPI is
> pretty much dead.

Uh that's rather bold. Android is built on it. Currently atomic kms is
built on it.

> What we should support is drm_syncobj, but that also only as an in-fence
> since that's what our hardware supports.

Convince Android folks, minimally. Probably a lot more. Yes with hindsight
we should have just gone for drm_syncobj instead of the sync_file thing,
but hindsight and all that.

This is kinda why I don't think trying to support the existing uapi with
userspace fences underneath with some magic tricks is a good idea. It's
just a pile of work, plus it's not really architecturally clean.

> > Anotherone that looks very sketchy right now is buffer sharing between
> > different userspace drivers, like compute <-> media (if you have some
> > fancy AI pipeline in your media workload, as an example).
> 
> Yeah, we are certainly going to get that. But only inside the same driver,
> so not much of a problem.

Why is this not much of a problem if it's just within one driver?

> > > > - ... at this point I stopped pondering but there's definitely more
> > > > 
> > > > Imo the only way we'll even get the complete is if we do the following:
> > > > 1. roll out implicit sync with userspace fences on a driver-by-driver 
> > > > basis
> > s/implicit/explicit/
> > 
> > But I think you got that.
> > 
> > > >  1a. including all the winsys/modeset stuff
> > > Completely agree, that's why I've split that up into individual patches.
> > > 
> > > I'm also fine if drivers can just opt out of user fence based
> > > synchronization and we return an error from dma_buf_dynamic_attach() if 
> > > some
> > > driver says it can't handle that.
> > Yeah, but that boils down to us just breaking those use-cases. Which is
> > exactly what you're trying to avoid by rolling out implicit user fence I
> > think.
> 
> But we can add support to all drivers as necessary.
> 
> > 
> > 

Re: [RFC] Implicit vs explicit user fence sync

2021-05-10 Thread Christian König

Am 04.05.21 um 17:11 schrieb Daniel Vetter:

On Tue, May 04, 2021 at 04:26:42PM +0200, Christian König wrote:

Hi Daniel,

Am 04.05.21 um 16:15 schrieb Daniel Vetter:

Hi Christian,

On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:

Hi guys,

with this patch set I want to look into how much more additional work it
would be to support implicit sync compared to only explicit sync.

Turned out that this is much simpler than expected since the only
addition is that before a command submission or flip the kernel and
classic drivers would need to wait for the user fence to signal before
taking any locks.

It's a lot more I think
- sync_file/drm_syncobj still need to be supported somehow

You need that with explicit fences as well.

I'm just concentrating on what extra burden implicit sync would get us.

It's not just implicit sync. Currently the best approach we have for
explicit sync is hiding them in drm_syncobj. Because for that all the work
with intentional stall points and userspace submit thread already exists.

None of this work has been done for sync_file. And looking at how much
work it was to get drm_syncobj going, that will be anything but easy.


I don't think we will want this for sync_file in the first place.


- we need userspace to handle the stall in a submit thread at least
- there's nothing here that sets the sync object
- implicit sync isn't just execbuf, it's everything. E.g. the various
wait_bo ioctl also need to keep working, including timeout and
everything

Good point, but that should be relatively easily to add as well.


- we can't stall in atomic kms where you're currently stalling, that's for
sure. The uapi says "we're not stalling for fences in there", and you're
breaking that.

Again as far as I can see we run into the same problem with explicit sync.

So the question is where could we block for atomic modeset for user fences
in general?

Nah, I have an idea. But it only works if userspace is aware, because the
rules are essentialyl:

- when you supply a userspace in-fence, then you only get a userspace
   out-fence
- mixing in fences between dma-fence and user fence is ok
- mixing out fences isn't

And we currently do have sync_file out fence. So it's not possible to
support implicit user fence in atomic in a way which doesn't break the
uapi somewhere.

Doing the explicit user fence support first will make that very obvious.

And that's just the one ioctl I know is big trouble, I'm sure we'll find
more funny corner cases when we roll out explicit user fencing.


I think we can just ignore sync_file. As far as it concerns me that UAPI 
is pretty much dead.


What we should support is drm_syncobj, but that also only as an in-fence 
since that's what our hardware supports.



Anotherone that looks very sketchy right now is buffer sharing between
different userspace drivers, like compute <-> media (if you have some
fancy AI pipeline in your media workload, as an example).


Yeah, we are certainly going to get that. But only inside the same 
driver, so not much of a problem.





- ... at this point I stopped pondering but there's definitely more

Imo the only way we'll even get the complete is if we do the following:
1. roll out implicit sync with userspace fences on a driver-by-driver basis

s/implicit/explicit/

But I think you got that.


 1a. including all the winsys/modeset stuff

Completely agree, that's why I've split that up into individual patches.

I'm also fine if drivers can just opt out of user fence based
synchronization and we return an error from dma_buf_dynamic_attach() if some
driver says it can't handle that.

Yeah, but that boils down to us just breaking those use-cases. Which is
exactly what you're trying to avoid by rolling out implicit user fence I
think.


But we can add support to all drivers as necessary.




2. roll out support for userspace fences to drm_syncobj timeline for
 interop, both across process/userspace and across drivers
 2a. including all the winsys/modeset stuff, but hopefully that's
 largely solved with 1. already.

Correct, but again we need this for explicit fencing as well.


3. only then try to figure out how to retroshoehorn this into implicit
 sync, and whether that even makes sense.

Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
interop fun across drivers) is just praying that this time around we're
not collectively idiots and can correctly predict the future. That never
worked :-)


For this prototype this patch set doesn't implement any user fence
synchronization at all, but just assumes that faulting user pages is
sufficient to make sure that we can wait for user space to finish
submitting the work. If necessary this can be made even more strict, the
only use case I could find which blocks this is the radeon driver and
that should be handle able.

This of course doesn't give you the same semantic as the classic
implicit sync to 

Re: [RFC] Implicit vs explicit user fence sync

2021-05-04 Thread Daniel Vetter
On Tue, May 04, 2021 at 04:26:42PM +0200, Christian König wrote:
> Hi Daniel,
> 
> Am 04.05.21 um 16:15 schrieb Daniel Vetter:
> > Hi Christian,
> > 
> > On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:
> > > Hi guys,
> > > 
> > > with this patch set I want to look into how much more additional work it
> > > would be to support implicit sync compared to only explicit sync.
> > > 
> > > Turned out that this is much simpler than expected since the only
> > > addition is that before a command submission or flip the kernel and
> > > classic drivers would need to wait for the user fence to signal before
> > > taking any locks.
> > It's a lot more I think
> > - sync_file/drm_syncobj still need to be supported somehow
> 
> You need that with explicit fences as well.
> 
> I'm just concentrating on what extra burden implicit sync would get us.

It's not just implicit sync. Currently the best approach we have for
explicit sync is hiding them in drm_syncobj. Because for that all the work
with intentional stall points and userspace submit thread already exists.

None of this work has been done for sync_file. And looking at how much
work it was to get drm_syncobj going, that will be anything but easy.

> > - we need userspace to handle the stall in a submit thread at least
> > - there's nothing here that sets the sync object
> > - implicit sync isn't just execbuf, it's everything. E.g. the various
> >wait_bo ioctl also need to keep working, including timeout and
> >everything
> 
> Good point, but that should be relatively easily to add as well.
> 
> > - we can't stall in atomic kms where you're currently stalling, that's for
> >sure. The uapi says "we're not stalling for fences in there", and you're
> >breaking that.
> 
> Again as far as I can see we run into the same problem with explicit sync.
> 
> So the question is where could we block for atomic modeset for user fences
> in general?

Nah, I have an idea. But it only works if userspace is aware, because the
rules are essentialyl:

- when you supply a userspace in-fence, then you only get a userspace
  out-fence
- mixing in fences between dma-fence and user fence is ok
- mixing out fences isn't

And we currently do have sync_file out fence. So it's not possible to
support implicit user fence in atomic in a way which doesn't break the
uapi somewhere.

Doing the explicit user fence support first will make that very obvious.

And that's just the one ioctl I know is big trouble, I'm sure we'll find
more funny corner cases when we roll out explicit user fencing.

Anotherone that looks very sketchy right now is buffer sharing between
different userspace drivers, like compute <-> media (if you have some
fancy AI pipeline in your media workload, as an example).

> > - ... at this point I stopped pondering but there's definitely more
> > 
> > Imo the only way we'll even get the complete is if we do the following:
> > 1. roll out implicit sync with userspace fences on a driver-by-driver basis
s/implicit/explicit/

But I think you got that.

> > 1a. including all the winsys/modeset stuff
> 
> Completely agree, that's why I've split that up into individual patches.
> 
> I'm also fine if drivers can just opt out of user fence based
> synchronization and we return an error from dma_buf_dynamic_attach() if some
> driver says it can't handle that.

Yeah, but that boils down to us just breaking those use-cases. Which is
exactly what you're trying to avoid by rolling out implicit user fence I
think.

> > 2. roll out support for userspace fences to drm_syncobj timeline for
> > interop, both across process/userspace and across drivers
> > 2a. including all the winsys/modeset stuff, but hopefully that's
> > largely solved with 1. already.
> 
> Correct, but again we need this for explicit fencing as well.
> 
> > 3. only then try to figure out how to retroshoehorn this into implicit
> > sync, and whether that even makes sense.
> > 
> > Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
> > interop fun across drivers) is just praying that this time around we're
> > not collectively idiots and can correctly predict the future. That never
> > worked :-)
> > 
> > > For this prototype this patch set doesn't implement any user fence
> > > synchronization at all, but just assumes that faulting user pages is
> > > sufficient to make sure that we can wait for user space to finish
> > > submitting the work. If necessary this can be made even more strict, the
> > > only use case I could find which blocks this is the radeon driver and
> > > that should be handle able.
> > > 
> > > This of course doesn't give you the same semantic as the classic
> > > implicit sync to guarantee that you have exclusive access to a buffers,
> > > but this is also not necessary.
> > > 
> > > So I think the conclusion should be that we don't need to concentrate on
> > > implicit vs. explicit sync, but rather how to get the 

Re: [RFC] Implicit vs explicit user fence sync

2021-05-04 Thread Christian König

Hi Daniel,

Am 04.05.21 um 16:15 schrieb Daniel Vetter:

Hi Christian,

On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:

Hi guys,

with this patch set I want to look into how much more additional work it
would be to support implicit sync compared to only explicit sync.

Turned out that this is much simpler than expected since the only
addition is that before a command submission or flip the kernel and
classic drivers would need to wait for the user fence to signal before
taking any locks.

It's a lot more I think
- sync_file/drm_syncobj still need to be supported somehow


You need that with explicit fences as well.

I'm just concentrating on what extra burden implicit sync would get us.


- we need userspace to handle the stall in a submit thread at least
- there's nothing here that sets the sync object
- implicit sync isn't just execbuf, it's everything. E.g. the various
   wait_bo ioctl also need to keep working, including timeout and
   everything


Good point, but that should be relatively easily to add as well.


- we can't stall in atomic kms where you're currently stalling, that's for
   sure. The uapi says "we're not stalling for fences in there", and you're
   breaking that.


Again as far as I can see we run into the same problem with explicit sync.

So the question is where could we block for atomic modeset for user 
fences in general?



- ... at this point I stopped pondering but there's definitely more

Imo the only way we'll even get the complete is if we do the following:
1. roll out implicit sync with userspace fences on a driver-by-driver basis
1a. including all the winsys/modeset stuff


Completely agree, that's why I've split that up into individual patches.

I'm also fine if drivers can just opt out of user fence based 
synchronization and we return an error from dma_buf_dynamic_attach() if 
some driver says it can't handle that.



2. roll out support for userspace fences to drm_syncobj timeline for
interop, both across process/userspace and across drivers
2a. including all the winsys/modeset stuff, but hopefully that's
largely solved with 1. already.


Correct, but again we need this for explicit fencing as well.


3. only then try to figure out how to retroshoehorn this into implicit
sync, and whether that even makes sense.

Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
interop fun across drivers) is just praying that this time around we're
not collectively idiots and can correctly predict the future. That never
worked :-)


For this prototype this patch set doesn't implement any user fence
synchronization at all, but just assumes that faulting user pages is
sufficient to make sure that we can wait for user space to finish
submitting the work. If necessary this can be made even more strict, the
only use case I could find which blocks this is the radeon driver and
that should be handle able.

This of course doesn't give you the same semantic as the classic
implicit sync to guarantee that you have exclusive access to a buffers,
but this is also not necessary.

So I think the conclusion should be that we don't need to concentrate on
implicit vs. explicit sync, but rather how to get the synchronization
and timeout signalling figured out in general.

I'm not sure what exactly you're proving here aside from "it's possible to
roll out a function with ill-defined semantics to all drivers". This
really is a lot harder than just this one function and just this one patch
set.


No it isn't. The hard part is getting the user sync stuff up in general.

Adding implicit synchronization on top of that is then rather trivial.

Christian.


-Daniel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] Implicit vs explicit user fence sync

2021-05-04 Thread Daniel Vetter
Hi Christian,

On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:
> Hi guys,
> 
> with this patch set I want to look into how much more additional work it
> would be to support implicit sync compared to only explicit sync.
> 
> Turned out that this is much simpler than expected since the only
> addition is that before a command submission or flip the kernel and
> classic drivers would need to wait for the user fence to signal before
> taking any locks.

It's a lot more I think
- sync_file/drm_syncobj still need to be supported somehow
- we need userspace to handle the stall in a submit thread at least
- there's nothing here that sets the sync object
- implicit sync isn't just execbuf, it's everything. E.g. the various
  wait_bo ioctl also need to keep working, including timeout and
  everything
- we can't stall in atomic kms where you're currently stalling, that's for
  sure. The uapi says "we're not stalling for fences in there", and you're
  breaking that.
- ... at this point I stopped pondering but there's definitely more

Imo the only way we'll even get the complete is if we do the following:
1. roll out implicit sync with userspace fences on a driver-by-driver basis
   1a. including all the winsys/modeset stuff
2. roll out support for userspace fences to drm_syncobj timeline for
   interop, both across process/userspace and across drivers
   2a. including all the winsys/modeset stuff, but hopefully that's
   largely solved with 1. already.
3. only then try to figure out how to retroshoehorn this into implicit
   sync, and whether that even makes sense.

Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
interop fun across drivers) is just praying that this time around we're
not collectively idiots and can correctly predict the future. That never
worked :-)

> For this prototype this patch set doesn't implement any user fence
> synchronization at all, but just assumes that faulting user pages is
> sufficient to make sure that we can wait for user space to finish
> submitting the work. If necessary this can be made even more strict, the
> only use case I could find which blocks this is the radeon driver and
> that should be handle able.
> 
> This of course doesn't give you the same semantic as the classic
> implicit sync to guarantee that you have exclusive access to a buffers,
> but this is also not necessary.
> 
> So I think the conclusion should be that we don't need to concentrate on
> implicit vs. explicit sync, but rather how to get the synchronization
> and timeout signalling figured out in general.

I'm not sure what exactly you're proving here aside from "it's possible to
roll out a function with ill-defined semantics to all drivers". This
really is a lot harder than just this one function and just this one patch
set.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC] Implicit vs explicit user fence sync

2021-05-04 Thread Christian König
Hi guys,

with this patch set I want to look into how much more additional work it would 
be to support implicit sync compared to only explicit sync.

Turned out that this is much simpler than expected since the only addition is 
that before a command submission or flip the kernel and classic drivers would 
need to wait for the user fence to signal before taking any locks.

For this prototype this patch set doesn't implement any user fence 
synchronization at all, but just assumes that faulting user pages is sufficient 
to make sure that we can wait for user space to finish submitting the work. If 
necessary this can be made even more strict, the only use case I could find 
which blocks this is the radeon driver and that should be handle able.

This of course doesn't give you the same semantic as the classic implicit sync 
to guarantee that you have exclusive access to a buffers, but this is also not 
necessary.

So I think the conclusion should be that we don't need to concentrate on 
implicit vs. explicit sync, but rather how to get the synchronization and 
timeout signalling figured out in general.

Regards,
Christian.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel