On Sun, May 6, 2018 at 1:51 PM, Tobias Klausmann <tobias.johannes.klausm...@mni.thm.de> wrote: > Hi, > > fyi: there is another bugreport #106372 [1], where i bisected the problem in > the xserver and found a problematic commit, with code which can easily be > reverted (patch in the bugreport), maybe you could check if that fixes the > issue as well!
Hi Tobias, thanks for the info. Yes, that's consistent with the Mesa bug and why it apparently happens only 1.20 modesetting-ddx - or infrequently enough on other ddx'en for nobody making a connection. 1. Mesa feeds way too large (way in the future) >> 2^32 targetMsc's into the PresentPixmap request, due to the Mesa bug. 2. Other ddx truncate the way too large targetMsc back to < 2^32 when using the old drmWaitVblank ioctl to queue a vblank event, and due to the magic of integer 32 bit truncation, most or all of the damage is undone. Maybe no glitch, or only a hang of a few frames duration, or only very infrequent long hangs, depending on the exact timing of client vs. server execution, what and how much drawing plasmashell does, etc. 3. modesetting-ddx directly queues the too large targetMsc via the new drmCrtcQueueSequence ioctl if running on Linux 4.15 or later, and the kernel dutyfully waits forever -> Hang. I think in Michel's debug patch, only applying the #if 0 for the ms_queue_vblank() function should be enough for the ddx to work around the Mesa bug. Fixing client bugs in the server is probably not a good idea though, given that we know it is a Mesa bug. I think i found - and hopefully fixed - three other bugs in the modesetting-ddx vblank handling, but they would only help for other issues, not this specific one. thanks, -mario > > PS: I looked into bugzilla last weekend where i bisected this issue and did > not recheck when opening the actual bugreport (sorry for that) > > [1] https://bugs.freedesktop.org/show_bug.cgi?id=106372 > > Greetings, > > Tobias > > > > On 5/4/18 3:45 PM, Mario Kleiner wrote: >> >> Two patches, solving the same problem in two different ways, the 1st >> one ready to go, the 2nd one would need the debug statements removed. >> >> Only apply one of those for testing, the 2nd one will be useless with >> the 1st one applied, but demonstrates the problem. >> >> So X-Server 1.20 RC + modesetting-ddx with DRI3/Present hangs at least >> KDE-5's plasmashell and makes KDE-5 unusable with that setup. >> >> As KDE's plasmashell uses QT-5's QtQuick OpenGL based rendering api's >> to render scene-graphs, this bug might affect other QT applications >> as well. >> >> This fix works, but it points to some problems in modesetting-ddx's >> current vblank handling, because other ddx'en seem to be mostly >> unaffected by this Mesa bug. >> >> The problem is that neither of these two fixes is a proper final >> solution, but better than nothing. It leaves the OML_sync_control >> extensions glXWaitForSbcOML(), glXWaitForMscOML() calls and the >> SGI_video_sync glXWaitVideoSyncSGI() functions broken for some >> use patterns. >> >> The real problem, if i understand it correctly, is the way the life-time >> of dri3_drawables and loader_dri3_drawables is managed atm. by Mesa's >> bindContext() functions. Whenever glXMakeCurrent() etc. are called to >> assign new/different GLXDrawables to the same context (ie. one context >> reused for drawing into many different drawables, as opposed to using >> one dedicated context for each drawable), we destroy the underlying >> DRIDrawables/dri3_drawables_loader_dri3_drawables and they lose all >> state wrt. pending bufferswaps, msc, sbc, ust. >> >> Nothing in the specs says that clients should expect to lose such >> state on a GLXDrawable d1 whenever they reassign drawables other than >> d1 to a GL context. A sequence like... >> >> 1.glXMakeCurrent(context, drawable1); >> 2.draw draw draw >> 3.glXSwapbuffers(context, drawable1); >> 4.glXMakeCurrent(context, drawable2); // drawable 1 loses all state! >> 5.glXWaitForSbcOML(dpy, drawable1, ...); >> >> ... would probably cause a hang of the client in glXWaitForSbcOML, as >> the function requires information stored in the "original" drawable1 >> up to step 3, but lost in step 4 due to dri3_drawable destruction. >> >> Patch 1 has a potentially large performance impact when switching >> drawables on a given context, due to the enforced wait on swap completion, >> but might save OML clients which do waits for sbc,msc on a separate >> thread, >> whereas patch 2 doesn't have a performance impact, but doesn't even >> partially solve trouble with OML_sync_control. >> >> However, i'm totally out of time atm. and probably not the right person >> to think about a better solution, and by dumb luck, my own application >> doesn't recycle the same context for different drawables, but uses a >> dedicated context for each drawable, so it dodges this bullet. >> >> Therefore one of these patches is either a good enough fix for the KDE >> hang problems atm. or a diagnosis of the problem as a starting point for >> brighter people to deal with the root cause ;-) >> >> Thanks, >> -mario >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev