https://bugs.kde.org/show_bug.cgi?id=510734

--- Comment #6 from nyanpasu64 <[email protected]> ---
Created attachment 185908
  --> https://bugs.kde.org/attachment.cgi?id=185908&action=edit
Output of `WAYLAND_DEBUG=1 kcmshell6 kcm_kscreen` showing crashing and
non-crashing runs

I decided to log client communications by running `bash -c 'WAYLAND_DEBUG=1
kcmshell6 kcm_kscreen & echo $!' 2>&1 | tee kcmshell6.log`.

If I rapidly switch monitors, I see many events of form `[2341899.277]
discarded [unknown]#4278190117.[event 3](0 fd, 8 byte)`, followed by error `not
a valid new object id (4278190164), message mode(n)` and the window closing.
The funny thing is that there actually *is* a preceding ID 4278190163, except
it only takes the form of a discarded message:

[2352757.126] discarded [unknown]#4278190163.[event 0](0 fd, 16 byte)
[2352757.128] discarded [unknown]#4278190163.[event 1](0 fd, 12 byte)

If I switch monitors slowly (kcmshell6-normal.log), I *do* see discarded
messages, but with 8-byte payloads, not directly following a global/bind of
"kde_output_device_v2". Oddly if I slowly connect, disconnect, and reconnect
the CRT, the modes never reuse an ID but send continually increasing ones to
kcmshell5.

## Analysis

By comparing kcmshell6*-fmt.log, I think I have a lead on what's going on. Of
course it's a race condition...
I suggest opening the two formatted files in separate editor panes (eg. VS
Code) and enabling synchronized/locked scrolling. I'll be commenting on the
messages received.

// plugged in a new display!
- The server's registries inform the client we have a new kde_output_device_v2
global, and the client binds it to a new ID. (!!!)
- The server's kde_output_order_v1 tells the client we now have named outputs
DP-1 and DP-2. (This is unrelated to the crash.)
- The server's registries inform the client we have a new wl_output global, the
client binds it to a new ID, and asks the zxdg_output_manager_v1 to create a
new zxdg_output_v1 for the wl_output. (This is unrelated to the crash.)
- Why do we have two wl_registry and we bind the kde_output_device_v2 and
wl_output globals to a different one? I don't know.

// remove global kde_output_device_v2 89, global wl_output 90
// destroy id zxdg_output_v1#61, id wl_output#65
// don't destroy id kde_output_device_v2#30.
[2352754.786] {Default Queue} wl_registry#71.global_remove(89)
[2352754.794] {Default Queue} wl_registry#2.global_remove(89)
...
If we unplug the display quickly, we have an unscheduled interruption:
- The server's wl_registries remove the global kde_output_device_v2 and
wl_output. (!!!)
- The server's kde_output_order_v1 tells the client we have named output DP-1
only. (This is unrelated to the crash.)
- The server destroys the object IDs for the zxdg_output_v1 and wl_output, but
*not* the kde_output_device_v2 (even though the corresponding global is gone!).

// the global kde_output_device_v2 is deleted, but we keep receiving events
from its binding,
// and the client doesn't recognize them.
In the normal display connection, the server proceeds to send the client the
kde_output_device_v2's resolutions/properties. If we've unplugged the display,
the server deletes the kde_output_device_v2 global but keeps sending the client
messages from the binding, but WAYLAND_DEBUG can't understand them (and
libwayland/wl_map crashes when reserving IDs after the new mode IDs).

// this is supposed to create new id kde_output_device_mode_v2#4278190159, but
we don't understand it.
[2352757.103] discarded [unknown]#30.[event 2](0 fd, 12 byte)

// now we receive events from an ID we've never seen before.
// but this isn't an error. *creating* invalid ids is.
- We get unknown messages from kde_output_device_v2 attempting (and failing) to
create kde_output_device_mode_v2#4278190159 through 4278190163, along with
unknown messages from the modes.

// set current mode
- The disconnected display receives a flurry of distinct event IDs from the
kde_output_device_v2 that shouldn't exist.

// delete_id zxdg_output_v1, wl_output (both previously destroyed/released)
[2352757.178] {Display Queue} wl_display#1.delete_id(61)
[2352757.184] {Display Queue} wl_display#1.delete_id(65)
- It's strange that even when *normally* unplugging a monitor, we fail to
delete the kde_output_device_v2 and kde_output_device_mode_v2, and can never
reuse their IDs. If you open `kcmshell6-normal.log` and search for 4278190082,
it's allocated for the LCD's (1920, 1080) mode, never destroyed, and the ID is
never reused but instead keeps incrementing. I suspect this ID leak bug could
cause issues for long-lived processes like plasmashell.

// a second passes, plug the monitor back in.
not a valid new object id (4278190164), message mode(n)
The Wayland connection experienced a fatal error: Invalid argument

The server created modes with IDs up to 4278190163 and never deleted them. The
client never saw those modes. The next time the Wayland server needs to send an
object to the client (eg. interactions, monitor changes, clipboard events), it
thinks the next unused ID is 4278190164, but the client thinks this ID is
invalid because of the gap from the last ID it saw.

## Next Steps

- I'm guessing the bug is that kwin_wayland keeps sending messages from a
kde_output_device_v2 binding *after* it's issued a global_remove() to the
source object.
    - Why can't the client parse the messages? Did it see the global_remove()
and invalidate (forget the interface/type of) all object IDs based on those
globals, or were the messages sent from the server corrupted in some way?
    - Is this a bug and how should the client handle it?
https://wayland.freedesktop.org/docs/html/apa.html#protocol-spec-wl_registry-event-global_remove
says the object IDs remain valid for the *client* to send messages, until the
client sees the global_remove and replies by destroying the object. It doesn't
say how the server should act.
- I think you should find a way to destroy the IDs allocated to displays and
modes so they don't leak infinitely in the client (right now it happens even if
you don't crash).

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to