Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-05-28 Thread Marek Olšák
My first email can be ignored except for the sync files. Oh well.

I think I see what you mean, Christian. If we assume that an imported fence
is always read only (the buffer with the sequence number is read only),
only the process that created and exported the fence can signal it. If the
fence is not signaled, the exporting process is guilty. The only thing the
importing process must do when it's about to use the fence as a dependency
is to notify the kernel about it. Thus, the kernel will always know the
dependency graph. Then if the importing process times out, the kernel will
blame any of the processes that passed it a fence that is still unsignaled.
The kernel will blame the process that timed out only if all imported
fences have been signaled. It seems pretty robust.

It's the same with implicit sync except that the buffer with the sequence
number is writable. Any process that has an implicitly-sync'd buffer can
set the sequence number to 0 or UINT64_MAX. 0 will cause a timeout for the
next job, while UINT64_MAX might cause a timeout a little later. The
timeout can be mitigated by the kernel because the kernel knows the
greatest number that should be there, but it's not possible to know which
process is guilty (all processes holding the buffer handle would be
suspects).

Marek

On Fri, May 28, 2021 at 6:25 PM Marek Olšák  wrote:

> If both implicit and explicit synchronization are handled the same, then
> the kernel won't be able to identify the process that caused an implicit
> sync deadlock. The process that is stuck waiting for a fence can be
> innocent, and the kernel can't punish it. Likewise, the GPU reset guery
> that reports which process is guilty and innocent will only be able to
> report unknown. Is that OK?
>
> Marek
>
> On Fri, May 28, 2021 at 10:41 AM Christian König <
> ckoenig.leichtzumer...@gmail.com> wrote:
>
>> Hi Marek,
>>
>> well I don't think that implicit and explicit synchronization needs to be
>> mutual exclusive.
>>
>> What we should do is to have the ability to transport an synchronization
>> object with each BO.
>>
>> Implicit and explicit synchronization then basically become the same,
>> they just transport the synchronization object differently.
>>
>> The biggest problem are the sync_files for Android, since they are really
>> not easy to support at all. If Android wants to support user queues we
>> would probably have to do some changes there.
>>
>> Regards,
>> Christian.
>>
>> Am 27.05.21 um 23:51 schrieb Marek Olšák:
>>
>> Hi,
>>
>> Since Christian believes that we can't deadlock the kernel with some
>> changes there, we just need to make everything nice for userspace too.
>> Instead of explaining how it will work, I will explain the cases where
>> future hardware (and its kernel driver) will break existing userspace in
>> order to protect everybody from deadlocks. Anything that uses implicit sync
>> will be spared, so X and Wayland will be fine, assuming they don't
>> import/export fences. Those use cases that do import/export fences might or
>> might not work, depending on how the fences are used.
>>
>> One of the necessities is that all fences will become future fences. The
>> semantics of imported/exported fences will change completely and will have
>> new restrictions on the usage. The restrictions are:
>>
>>
>> 1) Android sync files will be impossible to support, so won't be
>> supported. (they don't allow future fences)
>>
>>
>> 2) Implicit sync and explicit sync will be mutually exclusive between
>> process. A process can either use one or the other, but not both. This is
>> meant to prevent a deadlock condition with future fences where any process
>> can malevolently deadlock execution of any other process, even execution of
>> a higher-privileged process. The kernel will impose the following
>> restrictions to protect against the deadlock:
>>
>> a) a process with an implicitly-sync'd imported/exported buffer can't
>> import/export a fence from/to another process
>> b) a process with an imported/exported fence can't import/export an
>> implicitly-sync'd buffer from/to another process
>>
>> Alternative: A higher-privileged process could enforce both restrictions
>> instead of the kernel to protect itself from the deadlock, but this would
>> be a can of worms for existing userspace. It would be better if the kernel
>> just broke unsafe userspace on future hw, just like sync files.
>>
>> If both implicit and explicit sync are allowed to occur simultaneously,
>> sending a future fence that will never signal to any process will deadlock
>> that process after it acquires the implicit sync lock, which is a sequence
>> number that the process is required to write to memory and send an
>> interrupt from the GPU in a finite time. This is how the deadlock can
>> happen:
>>
>> * The process gets sequence number N from the kernel for an
>> implicitly-sync'd buffer.
>> * The process inserts (into the GPU user-mapped queue) a wait for
>> sequence number N-1.
>> * The 

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-05-28 Thread Marek Olšák
If both implicit and explicit synchronization are handled the same, then
the kernel won't be able to identify the process that caused an implicit
sync deadlock. The process that is stuck waiting for a fence can be
innocent, and the kernel can't punish it. Likewise, the GPU reset guery
that reports which process is guilty and innocent will only be able to
report unknown. Is that OK?

Marek

On Fri, May 28, 2021 at 10:41 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Hi Marek,
>
> well I don't think that implicit and explicit synchronization needs to be
> mutual exclusive.
>
> What we should do is to have the ability to transport an synchronization
> object with each BO.
>
> Implicit and explicit synchronization then basically become the same, they
> just transport the synchronization object differently.
>
> The biggest problem are the sync_files for Android, since they are really
> not easy to support at all. If Android wants to support user queues we
> would probably have to do some changes there.
>
> Regards,
> Christian.
>
> Am 27.05.21 um 23:51 schrieb Marek Olšák:
>
> Hi,
>
> Since Christian believes that we can't deadlock the kernel with some
> changes there, we just need to make everything nice for userspace too.
> Instead of explaining how it will work, I will explain the cases where
> future hardware (and its kernel driver) will break existing userspace in
> order to protect everybody from deadlocks. Anything that uses implicit sync
> will be spared, so X and Wayland will be fine, assuming they don't
> import/export fences. Those use cases that do import/export fences might or
> might not work, depending on how the fences are used.
>
> One of the necessities is that all fences will become future fences. The
> semantics of imported/exported fences will change completely and will have
> new restrictions on the usage. The restrictions are:
>
>
> 1) Android sync files will be impossible to support, so won't be
> supported. (they don't allow future fences)
>
>
> 2) Implicit sync and explicit sync will be mutually exclusive between
> process. A process can either use one or the other, but not both. This is
> meant to prevent a deadlock condition with future fences where any process
> can malevolently deadlock execution of any other process, even execution of
> a higher-privileged process. The kernel will impose the following
> restrictions to protect against the deadlock:
>
> a) a process with an implicitly-sync'd imported/exported buffer can't
> import/export a fence from/to another process
> b) a process with an imported/exported fence can't import/export an
> implicitly-sync'd buffer from/to another process
>
> Alternative: A higher-privileged process could enforce both restrictions
> instead of the kernel to protect itself from the deadlock, but this would
> be a can of worms for existing userspace. It would be better if the kernel
> just broke unsafe userspace on future hw, just like sync files.
>
> If both implicit and explicit sync are allowed to occur simultaneously,
> sending a future fence that will never signal to any process will deadlock
> that process after it acquires the implicit sync lock, which is a sequence
> number that the process is required to write to memory and send an
> interrupt from the GPU in a finite time. This is how the deadlock can
> happen:
>
> * The process gets sequence number N from the kernel for an
> implicitly-sync'd buffer.
> * The process inserts (into the GPU user-mapped queue) a wait for sequence
> number N-1.
> * The process inserts a wait for a fence, but it doesn't know that it will
> never signal ==> deadlock.
> ...
> * The process inserts a command to write sequence number N to a
> predetermined memory location. (which will make the buffer idle and send an
> interrupt to the kernel)
> ...
> * The kernel will terminate the process because it has never received the
> interrupt. (i.e. a less-privileged process just killed a more-privileged
> process)
>
> It's the interrupt for implicit sync that never arrived that caused the
> termination, and the only way another process can cause it is by sending a
> fence that will never signal. Thus, importing/exporting fences from/to
> other processes can't be allowed simultaneously with implicit sync.
>
>
> 3) Compositors (and other privileged processes, and display flipping)
> can't trust imported/exported fences. They need a timeout recovery
> mechanism from the beginning, and the following are some possible solutions
> to timeouts:
>
> a) use a CPU wait with a small absolute timeout, and display the previous
> content on timeout
> b) use a GPU wait with a small absolute timeout, and conditional rendering
> will choose between the latest content (if signalled) and previous content
> (if timed out)
>
> The result would be that the desktop can run close to 60 fps even if an
> app runs at 1 fps.
>
> *Redefining imported/exported fences and breaking some users/OSs is the
> only way to have userspace GPU 

Re: [Mesa-dev] Linux Graphics Next: Userspace submission update

2021-05-28 Thread Christian König

Hi Marek,

well I don't think that implicit and explicit synchronization needs to 
be mutual exclusive.


What we should do is to have the ability to transport an synchronization 
object with each BO.


Implicit and explicit synchronization then basically become the same, 
they just transport the synchronization object differently.


The biggest problem are the sync_files for Android, since they are 
really not easy to support at all. If Android wants to support user 
queues we would probably have to do some changes there.


Regards,
Christian.

Am 27.05.21 um 23:51 schrieb Marek Olšák:

Hi,

Since Christian believes that we can't deadlock the kernel with some 
changes there, we just need to make everything nice for userspace too. 
Instead of explaining how it will work, I will explain the cases where 
future hardware (and its kernel driver) will break existing userspace 
in order to protect everybody from deadlocks. Anything that uses 
implicit sync will be spared, so X and Wayland will be fine, assuming 
they don't import/export fences. Those use cases that do import/export 
fences might or might not work, depending on how the fences are used.


One of the necessities is that all fences will become future fences. 
The semantics of imported/exported fences will change completely and 
will have new restrictions on the usage. The restrictions are:



1) Android sync files will be impossible to support, so won't be 
supported. (they don't allow future fences)



2) Implicit sync and explicit sync will be mutually exclusive between 
process. A process can either use one or the other, but not both. This 
is meant to prevent a deadlock condition with future fences where any 
process can malevolently deadlock execution of any other process, even 
execution of a higher-privileged process. The kernel will impose the 
following restrictions to protect against the deadlock:


a) a process with an implicitly-sync'd imported/exported buffer can't 
import/export a fence from/to another process
b) a process with an imported/exported fence can't import/export an 
implicitly-sync'd buffer from/to another process


Alternative: A higher-privileged process could enforce both 
restrictions instead of the kernel to protect itself from the 
deadlock, but this would be a can of worms for existing userspace. It 
would be better if the kernel just broke unsafe userspace on future 
hw, just like sync files.


If both implicit and explicit sync are allowed to occur 
simultaneously, sending a future fence that will never signal to any 
process will deadlock that process after it acquires the implicit sync 
lock, which is a sequence number that the process is required to write 
to memory and send an interrupt from the GPU in a finite time. This is 
how the deadlock can happen:


* The process gets sequence number N from the kernel for an 
implicitly-sync'd buffer.
* The process inserts (into the GPU user-mapped queue) a wait for 
sequence number N-1.
* The process inserts a wait for a fence, but it doesn't know that it 
will never signal ==> deadlock.

...
* The process inserts a command to write sequence number N to a 
predetermined memory location. (which will make the buffer idle and 
send an interrupt to the kernel)

...
* The kernel will terminate the process because it has never received 
the interrupt. (i.e. a less-privileged process just killed a 
more-privileged process)


It's the interrupt for implicit sync that never arrived that caused 
the termination, and the only way another process can cause it is by 
sending a fence that will never signal. Thus, importing/exporting 
fences from/to other processes can't be allowed simultaneously with 
implicit sync.



3) Compositors (and other privileged processes, and display flipping) 
can't trust imported/exported fences. They need a timeout recovery 
mechanism from the beginning, and the following are some possible 
solutions to timeouts:


a) use a CPU wait with a small absolute timeout, and display the 
previous content on timeout
b) use a GPU wait with a small absolute timeout, and conditional 
rendering will choose between the latest content (if signalled) and 
previous content (if timed out)


The result would be that the desktop can run close to 60 fps even if 
an app runs at 1 fps.


*Redefining imported/exported fences and breaking some users/OSs is 
the only way to have userspace GPU command submission, and the 
deadlock example here is the counterexample proving that there is no 
other way.*


So, what are the chances this is going to fly with the ecosystem?

Thanks,
Marek


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev