https://bugs.kde.org/show_bug.cgi?id=510734
--- Comment #6 from nyanpasu64 <[email protected]> --- Created attachment 185908 --> https://bugs.kde.org/attachment.cgi?id=185908&action=edit Output of `WAYLAND_DEBUG=1 kcmshell6 kcm_kscreen` showing crashing and non-crashing runs I decided to log client communications by running `bash -c 'WAYLAND_DEBUG=1 kcmshell6 kcm_kscreen & echo $!' 2>&1 | tee kcmshell6.log`. If I rapidly switch monitors, I see many events of form `[2341899.277] discarded [unknown]#4278190117.[event 3](0 fd, 8 byte)`, followed by error `not a valid new object id (4278190164), message mode(n)` and the window closing. The funny thing is that there actually *is* a preceding ID 4278190163, except it only takes the form of a discarded message: [2352757.126] discarded [unknown]#4278190163.[event 0](0 fd, 16 byte) [2352757.128] discarded [unknown]#4278190163.[event 1](0 fd, 12 byte) If I switch monitors slowly (kcmshell6-normal.log), I *do* see discarded messages, but with 8-byte payloads, not directly following a global/bind of "kde_output_device_v2". Oddly if I slowly connect, disconnect, and reconnect the CRT, the modes never reuse an ID but send continually increasing ones to kcmshell5. ## Analysis By comparing kcmshell6*-fmt.log, I think I have a lead on what's going on. Of course it's a race condition... I suggest opening the two formatted files in separate editor panes (eg. VS Code) and enabling synchronized/locked scrolling. I'll be commenting on the messages received. // plugged in a new display! - The server's registries inform the client we have a new kde_output_device_v2 global, and the client binds it to a new ID. (!!!) - The server's kde_output_order_v1 tells the client we now have named outputs DP-1 and DP-2. (This is unrelated to the crash.) - The server's registries inform the client we have a new wl_output global, the client binds it to a new ID, and asks the zxdg_output_manager_v1 to create a new zxdg_output_v1 for the wl_output. (This is unrelated to the crash.) - Why do we have two wl_registry and we bind the kde_output_device_v2 and wl_output globals to a different one? I don't know. // remove global kde_output_device_v2 89, global wl_output 90 // destroy id zxdg_output_v1#61, id wl_output#65 // don't destroy id kde_output_device_v2#30. [2352754.786] {Default Queue} wl_registry#71.global_remove(89) [2352754.794] {Default Queue} wl_registry#2.global_remove(89) ... If we unplug the display quickly, we have an unscheduled interruption: - The server's wl_registries remove the global kde_output_device_v2 and wl_output. (!!!) - The server's kde_output_order_v1 tells the client we have named output DP-1 only. (This is unrelated to the crash.) - The server destroys the object IDs for the zxdg_output_v1 and wl_output, but *not* the kde_output_device_v2 (even though the corresponding global is gone!). // the global kde_output_device_v2 is deleted, but we keep receiving events from its binding, // and the client doesn't recognize them. In the normal display connection, the server proceeds to send the client the kde_output_device_v2's resolutions/properties. If we've unplugged the display, the server deletes the kde_output_device_v2 global but keeps sending the client messages from the binding, but WAYLAND_DEBUG can't understand them (and libwayland/wl_map crashes when reserving IDs after the new mode IDs). // this is supposed to create new id kde_output_device_mode_v2#4278190159, but we don't understand it. [2352757.103] discarded [unknown]#30.[event 2](0 fd, 12 byte) // now we receive events from an ID we've never seen before. // but this isn't an error. *creating* invalid ids is. - We get unknown messages from kde_output_device_v2 attempting (and failing) to create kde_output_device_mode_v2#4278190159 through 4278190163, along with unknown messages from the modes. // set current mode - The disconnected display receives a flurry of distinct event IDs from the kde_output_device_v2 that shouldn't exist. // delete_id zxdg_output_v1, wl_output (both previously destroyed/released) [2352757.178] {Display Queue} wl_display#1.delete_id(61) [2352757.184] {Display Queue} wl_display#1.delete_id(65) - It's strange that even when *normally* unplugging a monitor, we fail to delete the kde_output_device_v2 and kde_output_device_mode_v2, and can never reuse their IDs. If you open `kcmshell6-normal.log` and search for 4278190082, it's allocated for the LCD's (1920, 1080) mode, never destroyed, and the ID is never reused but instead keeps incrementing. I suspect this ID leak bug could cause issues for long-lived processes like plasmashell. // a second passes, plug the monitor back in. not a valid new object id (4278190164), message mode(n) The Wayland connection experienced a fatal error: Invalid argument The server created modes with IDs up to 4278190163 and never deleted them. The client never saw those modes. The next time the Wayland server needs to send an object to the client (eg. interactions, monitor changes, clipboard events), it thinks the next unused ID is 4278190164, but the client thinks this ID is invalid because of the gap from the last ID it saw. ## Next Steps - I'm guessing the bug is that kwin_wayland keeps sending messages from a kde_output_device_v2 binding *after* it's issued a global_remove() to the source object. - Why can't the client parse the messages? Did it see the global_remove() and invalidate (forget the interface/type of) all object IDs based on those globals, or were the messages sent from the server corrupted in some way? - Is this a bug and how should the client handle it? https://wayland.freedesktop.org/docs/html/apa.html#protocol-spec-wl_registry-event-global_remove says the object IDs remain valid for the *client* to send messages, until the client sees the global_remove and replies by destroying the object. It doesn't say how the server should act. - I think you should find a way to destroy the IDs allocated to displays and modes so they don't leak infinitely in the client (right now it happens even if you don't crash). -- You are receiving this mail because: You are watching all bug changes.
