On 5/20/26 14:33, Xaver Hugl wrote:
> Am Mi., 20. Mai 2026 um 10:08 Uhr schrieb Christian König
> <[email protected]>:
>> Well I would say the other way around is a pretty common use case.
>>
>> In other words the compositors uses the internal GPU for composing and 
>> displaying the picture. And the client uses the external GPU for fast 
>> rendering.
> Sure, but that's not what I'm talking about.

Yeah sorry for that, I wasn't sure if I misunderstood your use case because 
it's usually the other way around.

>>> - the buffers from the client stay valid
>>
>> Buffers from the hot plugged GPU don't stay valid. Accessing CPU mappings 
>> either result in a SIGBUS or are redirected to a dummy page.
> Again, not what I wrote about. The buffers are on the integrated GPU.

General rule of thumb is that as long as the exporter stays around the buffers 
stay around as well.

>>> - the syncobj stays valid on the client side
>>> - the syncobj becomes invalid on the compositor side
>>
>> Nope that's not correct. The syncobj itself stays valid even if you 
>> completely hot plug the device.
>>
>> It can just be that the fences inside the syncobj are terminated with an 
>> error.
> What about eventfd created for a point on the syncobj?

The eventfd unfortunately doesn't has error handling as far as I know, so when 
a fence signals with an error condition then the eventfd you only sees that it 
is signaled.

> Another (future) problem with hotplugs will be if the sync file hasn't
> materialized for the timeline point when the device is hotunplugged,
> since there can't be an error on the fence if there isn't one. Or
> could userspace somehow set an 'artificial' fence with an error in
> that case?

In general the answer is yes, userspace needs to take care of inserting fences 
when wait before signal is used and the work can not be submitted to the HW for 
some reason.

Currently we only have an IOCTL to insert the signaled dummy fence at some 
timeline sequence, but it should be trivial as well to insert a signaled fence 
with an error code.

But the compositor needs to be able to handle that case anyway, because it can 
be that a malicious or just buggy client just never inserts the fence.

So that a device is hot plugged is not different to just a client not inserting 
the fence in the first place.

>>> "invalid" there means either
>>> - the acquire point of the client is marked as signaled, before
>>> rendering on the client side is completed
>>> - the acquire point of the client is never signaled. Since the
>>> compositor waits for the acquire point, the Wayland surface is stuck
>>> forever
>>
>> Both of those would be a *massive* violation of documented kernel rules for 
>> hot-plugging which could lead to random data corruption and/or deadlocks.
>>
>> If you see any HW driver showing behavior like that please open up a bug 
>> report and ping the relevant maintainers immediately.
> If there are no error codes with syncobj yet, then to userspace, the
> latter behavior is exactly what we get, isn't it?

No, from userspace side you just see a signaled fence. It's just that you need 
to export the timeline point of the syncobj to a syncfile and then you can call 
the QUERY IOCTL on the syncfile to see the error code.

>> When a hotplug happens all operations of the device should return an -ENODEV 
>> error, even when exposed to other devices/application through syncobj or 
>> syncfile.
> Okay, that at least gives us a way to fail imports somewhat
> gracefully. Normally, failing to import a syncobj is a fatal error in
> the Wayland protocol.

So the task at hand would be to avoid importing the syncobj into a driver. That 
should be relatively trivial.

The only real problem I see is if you want to create a syncobj without having 
any device whatsoever.

>> One problem is that only syncfile allows for querying such error codes at 
>> the moment, we have patches pending to add that to syncobj as well but we 
>> lack a compositor with support for that as userspace client.
> As long as the error case can be detected with an eventfd,

Yeah that's the problem. The eventfd only tells you if the operation is 
completed (or at least has materialized).

To query the error you would need to ask the underlying syncobj or syncfile 
directly.

> implementing that in KWin shouldn't be a challenge.
> 
>> Well the question here is if the device the compositor is using or the 
>> client is using is gone?
>>
>> If the client device is hot removed the compositor should be perfectly 
>> capable to import the syncobj.
>>
>> If the compositor device is gone then you don't have a device to display 
>> anything any more, so generating the next frame doesn't seem to make sense 
>> either.
>>
>> What could be is that you want the compositor to be kept alive even when the 
>> display device is gone to switch over to vkms or whatever so that a VNC 
>> session or other remote desktop still works.
> There are two GPUs in the example I gave. The compositor can use both
> for rendering (in cosmic-comp's case) or switch between them (what I'm
> trying to do with KWin), or use one device for rendering, and another
> for importing the syncobj.

Ah! I think I got the problem now. You basically want to avoid importing the 
syncobj because when the wrong device goes away you are busted.

The reason we didn't considered having the IOCTLs on the FD is because if you 
don't import them and instead keep them around you can run out file descriptors 
quite quickly.

When you have an use case where you receive an FD from the client and do a one 
shot conversion to an eventfd that will probably work, but for keeping them in 
the long run you need some kind of container for the syncobjs, don't you?

>>>>>>> 3. It removes the need to translate between syncobjs fds and handles.
>>>>>>
>>>>>> That's a pretty big no-go as well. The differentiation between FDs and 
>>>>>> handles is completely intentional.
>>>>> Could you expand on why it's needed? For compositors, the handle is
>>>>> just an intermediary thing when translating between file descriptors.
>>>>
>>>> Well what we could do is to add an IOCTL to directly attach an syncobj 
>>>> file descriptor to an eventfd.
>>> That would be nice.
>>
>> Take a look at drm_syncobj_file_fops and how drm_syncobj_add_eventfd() is 
>> used. Adding that functionality shouldn't be more than a typing exercise.
> Yeah, this patchset already adds that functionality (on the new device).
> 
>> Do I see it right that this would already solve most problems in the 
>> compositor side?
> Skipping the syncobj handle step would only reduce the amounts of
> ioctls the compositor does, but afaict it wouldn't solve any
> compositor problems. At least not as long as it's still tied to a drm
> device.

Yeah, you need something like a syncobj container or dummy DRM device.

> For device hotplugs, the only new thing we need for correctly handling
> syncobj is a way to receive errors on the eventfd.

I need to look into the eventfd code, could be that this is somehow possible 
but it's clearly not something I used before.

> A device-independent way to create and use syncobj would still be
> useful to us though, both to simplify the compositor and to improve
> the software rendering use cases.

Yeah not sure how to cleanly do that. We could have a dummy /dev/dri/rendersync 
or something like that, but that would be quite a hack.

At least I understand the requirement now.

Thanks,
Christian.

> 
> - Xaver

Reply via email to