Carsten Haitzler <ras...@rasterman.com> ezt írta (időpont: 2022. jan. 6., Cs, 21:58): > > On Thu, 6 Jan 2022 20:50:43 +0100 "ezerot...@gmail.com" <ezerot...@gmail.com> > said: > > > Carsten Haitzler <ras...@rasterman.com> ezt írta (időpont: 2022. jan. > > 5., Sze, 20:51): > > > > > > On Wed, 5 Jan 2022 17:21:46 +0100 "ezerot...@gmail.com" > > > <ezerot...@gmail.com> said: > > > > > > > Carsten Haitzler <ras...@rasterman.com> ezt írta (időpont: 2022. jan. > > > > 5., Sze, 14:50): > > > > > > > > > > On Wed, 5 Jan 2022 13:57:39 +0100 "ezerot...@gmail.com" > > > > > <ezerot...@gmail.com> said: > > > > > > > > > > > Carsten Haitzler <ras...@rasterman.com> ezt írta (időpont: 2022. > > > > > > jan. > > > > > > 5., Sze, 11:54): > > > > > > > > > > > > > > On Wed, 5 Jan 2022 08:41:05 +0100 "ezerot...@gmail.com" > > > > > > > <ezerot...@gmail.com> said: > > > > > > > > > > > > > > > Carsten Haitzler <ras...@rasterman.com> ezt írta (időpont: 2022. > > > > > > > > jan. 5., Sze, 0:37): > > > > > > > > > > > > > > > > > > On Tue, 4 Jan 2022 22:31:26 +0100 "ezerot...@gmail.com" > > > > > > > > > <ezerot...@gmail.com> said: > > > > > > > > > > > > > > > > > > > Carsten Haitzler <ras...@rasterman.com> ezt írta (időpont: > > > > > > > > > > 2022. jan. 4., K, 15:21): > > > > > > > > > > > > > > > > > > > > > > On Tue, 4 Jan 2022 11:56:00 +0100 "ezerot...@gmail.com" > > > > > > > > > > > <ezerot...@gmail.com> said: > > > > > > > > > > > > > > > > > > > > > > > Carsten Haitzler <ras...@rasterman.com> ezt írta > > > > > > > > > > > > (időpont: > > > > > > > > > > > > 2022. jan. 3., H, 22:49): > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 3 Jan 2022 22:28:19 +0100 > > > > > > > > > > > > > "ezerot...@gmail.com" > > > > > > > > > > > > > <ezerot...@gmail.com> said: > > > > > > > > > > > > > > > > > > > > > > > > > > > Carsten Haitzler <ras...@rasterman.com> ezt írta > > > > > > > > > > > > > > (időpont: > > > > > > > > > > > > > > 2022. jan. 3., H, 21:36): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 3 Jan 2022 19:34:41 +0100 > > > > > > > > > > > > > > > "ezerot...@gmail.com" <ezerot...@gmail.com> said: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Carsten Haitzler <ras...@rasterman.com> ezt írta > > > > > > > > > > > > > > > > (időpont: > > > > > > > > > > > > > > > > 2022. jan. 3., H, 19:13): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 3 Jan 2022 17:07:43 +0100 > > > > > > > > > > > > > > > > > "ezerot...@gmail.com" <ezerot...@gmail.com> > > > > > > > > > > > > > > > > > said: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've a brand new amd laptop with an nvidia > > > > > > > > > > > > > > > > > > mobile GPU. It arrived with TuxedoOS (ubuntu > > > > > > > > > > > > > > > > > > 20.04 + budgie wm) preinstalled. That setup > > > > > > > > > > > > > > > > > > works fine out of the box, but I want to > > > > > > > > > > > > > > > > > > replace budgie with enlightenment, because > > > > > > > > > > > > > > > > > > that's what I always use on linux. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've compiled E 0.25 from git (using > > > > > > > > > > > > > > > > > > https://github.com/batden/esteem), and it > > > > > > > > > > > > > > > > > > seemed to work fine. Unfortunately, when I > > > > > > > > > > > > > > > > > > tested suspend+resume, I had a problem. The > > > > > > > > > > > > > > > > > > desktop resumes, but only with minimal > > > > > > > > > > > > > > > > > > brightness, and then it seems to freeze (no > > > > > > > > > > > > > > > > > > keyboard/mouse). I can ssh into the laptop, > > > > > > > > > > > > > > > > > > and killing enlightenment sends me back to > > > > > > > > > > > > > > > > > > the lightdm login prompt. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > dmesg has this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [11814.110778] PM: suspend exit > > > > > > > > > > > > > > > > > > [11814.630838] NVRM: GPU at PCI:0000:01:00: > > > > > > > > > > > > > > > > > > GPU-589fde69-1161-f26b-1773-e5bcda70d601 > > > > > > > > > > > > > > > > > > [11814.630845] NVRM: Xid (PCI:0000:01:00): > > > > > > > > > > > > > > > > > > 13, > > > > > > > > > > > > > > > > > > pid=5525, Graphics Exception: Shader Program > > > > > > > > > > > > > > > > > > Header 11 Error [11814.630855] NVRM: Xid > > > > > > > > > > > > > > > > > > (PCI: > > > > > > > > > > > > > > > > > > 0000:01:00): 13, pid=5525, Graphics > > > > > > > > > > > > > > > > > > Exception: > > > > > > > > > > > > > > > > > > Shader Program Header 18 Error > > > > > > > > > > > > > > > > > > [11814.630865] > > > > > > > > > > > > > > > > > > NVRM: Xid (PCI: 0000:01:00): 13, pid=5525, > > > > > > > > > > > > > > > > > > Graphics Exception: ESR 0x405840=0xa2040800 > > > > > > > > > > > > > > > > > > [11814.630877] NVRM: Xid (PCI: 0000:01:00): > > > > > > > > > > > > > > > > > > 13, pid=5525, Graphics Exception: ESR > > > > > > > > > > > > > > > > > > 0x405848=0x80000000 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The problem happens with both the sw and the > > > > > > > > > > > > > > > > > > opengl compositors. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > When I suspend from the lightdm prompt or > > > > > > > > > > > > > > > > > > from the budgie desktop, resuming works > > > > > > > > > > > > > > > > > > fine. > > > > > > > > > > > > > > > > > > So it seems something is happening/not > > > > > > > > > > > > > > > > > > happening with the nvidia card when the > > > > > > > > > > > > > > > > > > suspend is started from E. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Anyone has any idea, how to debug this? > > > > > > > > > > > > > > > > > i suspect it may have to do with vblank > > > > > > > > > > > > > > > > > interrupts. the nvidia driver doesn't produce > > > > > > > > > > > > > > > > > them anymore? a quick way to test this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > touch ~/.ecore-no-vsync > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > restart e then do your suspend/resume > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your reply. Unfortunately the problem > > > > > > > > > > > > > > > > seems to be somewhere else, as resuming still > > > > > > > > > > > > > > > > fails the same way. Anything else to try? Could > > > > > > > > > > > > > > > > rebuilding E in debugging mode help? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > probably not - btw - those shader exceptions might > > > > > > > > > > > > > > > have to do with it. evas caches binaries for > > > > > > > > > > > > > > > shaders. rm -rf ~/.cache/evas_gl_common_caches/ - > > > > > > > > > > > > > > > but beyond that the only thing left is your > > > > > > > > > > > > > > > driver. > > > > > > > > > > > > > > > those are its shaders it compiled. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > google for it: "Graphics Exception: Shader Program > > > > > > > > > > > > > > > Header 11 Error" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > seems to actually be OS independent and happen on > > > > > > > > > > > > > > > windows too. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://forums.developer.nvidia.com/t/screen-system-is-dead-on-resume-unable-to-resume-with-all-current-drivers/29872/57?page=3 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > this has been there for a long time... and it > > > > > > > > > > > > > > > seems > > > > > > > > > > > > > > > it doesn't get resolved. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/Bumblebee-Project/Bumblebee/issues/739 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yeah, I've tried googling for this too, but found no > > > > > > > > > > > > > > solutions either. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > it could be that evas uses egl+gles and the nvidia > > > > > > > > > > > > > > > driver implementation for egl+gles is buggy - you > > > > > > > > > > > > > > > can rebuild efl to use full desktop opengl+glx > > > > > > > > > > > > > > > (-Dopengl=full). > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've deleted the evas cache, and set the compositor > > > > > > > > > > > > > > to SW to make sure that it's not an evas egl > > > > > > > > > > > > > > problem. > > > > > > > > > > > > > > The exceptions are still there. Actually there are 3 > > > > > > > > > > > > > > exceptions for the kernel thread "[irq/92-nvidia]", > > > > > > > > > > > > > > and 1 for Xorg. When the compositor was set to > > > > > > > > > > > > > > opengl > > > > > > > > > > > > > > there were more exceptions, and one of them is was > > > > > > > > > > > > > > for the enlightenment process. > > > > > > > > > > > > > > > > > > > > > > > > > > > > So my guess is, that this may not be a problem in E, > > > > > > > > > > > > > > but maybe a missing/extra step during > > > > > > > > > > > > > > suspend/resume. > > > > > > > > > > > > > > I'll look into this tomorrow. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your help, Laszlo > > > > > > > > > > > > > > > > > > > > > > > > > > hmm i wonder why the nvidia driver is complaining - > > > > > > > > > > > > > something is using a shader program of some sot and > > > > > > > > > > > > > it's > > > > > > > > > > > > > not happy at all. there i something deeper going on > > > > > > > > > > > > > here. but yes - with e using opengl for compositing > > > > > > > > > > > > > it'll be driving the gpu (via opengl) and thus more > > > > > > > > > > > > > chance of something going wrong. > > > > > > > > > > > > > > > > > > > > > > > > I've found another strange thing. In my original > > > > > > > > > > > > configuration I used amdgpu+nvidia X drivers. Now I > > > > > > > > > > > > switched to modesetting+nvidia. Resuming fails again, > > > > > > > > > > > > but > > > > > > > > > > > > there is a different new problem. After starting E from > > > > > > > > > > > > lightdm as usual, I press ctrl+alt+end to restart E, it > > > > > > > > > > > > fades to black as usual, then it switches to something > > > > > > > > > > > > that looks like a console (empty black screen with a > > > > > > > > > > > > cursor line) and stays there. I can not restore the > > > > > > > > > > > > desktop until I kill E. No exceptions from nvidia in > > > > > > > > > > > > the > > > > > > > > > > > > dmesg this time. Any idea for this? > > > > > > > > > > > > > > > > > > > > > > so this is an optimus setup of some sort but now with amd > > > > > > > > > > > + > > > > > > > > > > > nvidia... i might imagine something goes wrong setting up > > > > > > > > > > > randr maybe? simotek found his optimus setup required a > > > > > > > > > > > forced refresh of randr info ... and e has that in it > > > > > > > > > > > (otherwise edid info would not be populated right). check > > > > > > > > > > > ~/.e-log.log - it will tell you what e is doing randr-wise > > > > > > > > > > > and what it sees, but you should end up with some kind of > > > > > > > > > > > screen. perhaps go back away from modesetting to amdgpu + > > > > > > > > > > > nvidia? > > > > > > > > > > > > > > > > > > > > I've switched off the optimus stuff, and checked what > > > > > > > > > > happens > > > > > > > > > > with the nvidia only setup. Unfortunately it failed with the > > > > > > > > > > usual GPU error. > > > > > > > > > > > > > > > > > > > > Then I switched back to amdgpu+nvidia again, and saved the > > > > > > > > > > log > > > > > > > > > > file. Maybe you can see something in it: > > > > > > > > > > > > > > > > > > > > https://drive.google.com/file/d/1r69Bw43uMS8xWM2wemqxUvIAr0xH76pp/view?usp=sharing > > > > > > > > > > > > > > > > > > resume has nothing odd to do with randr.. but this smells a > > > > > > > > > bit > > > > > > > > > weird: > > > > > > > > > > > > > > > > > > ERROR: ecore_animator thread - epoll_wait(..., 200) at > > > > > > > > > 3870,51700 should have slept ~ 0,01667s but took 1,65593s! > > > > > > > > > > > > > > > > > > that smells very wrong - the animator thread asked to sleep > > > > > > > > > for > > > > > > > > > 16.67ms but slept 1650ms instead ... and this is measuring > > > > > > > > > monotonic time - not wall clock. monotonic stops ticking when > > > > > > > > > suspended. this thread is dedicated to ticking for animation > > > > > > > > > so > > > > > > > > > will not be blocked by the mainloop... this is kernel not > > > > > > > > > sleeping for anywhere near the time it should. > > > > > > > > > > > > > > > > > > so with amdgpu+nvidia it works? i'm not sure from your mail. > > > > > > > > > > > > > > > > None of amdgpu+nvidia, modesetting+nvidia, and nvidia alone work > > > > > > > > - GPU shader error when resuming. Desktop is at minimal > > > > > > > > brightness, no inputs accepted. > > > > > > > > > > > > > > Well it could be E is hung - you will only know if you send a SEGV > > > > > > > signal (kill -SEGV `pidof enlgithenment`) then collect a backtrace > > > > > > > with gdb and see where it's at. > > > > > > > > > > > > Actually it seems that not E is hung, but rather the X server. When > > > > > > I > > > > > > kill E, it gets restarted (new PID) but the desktop remains frozen. > > > > > > I > > > > > > have to kill enlightenment_start to get back to the lighdm login > > > > > > prompt. > > > > > > > > > > wow.. well then... maybe e hit on an xorg/nvidia driver bug? some > > > > > people > > > > > have reported bad things with sddm - somehow it has caused e to launch > > > > > in wayland .. or xwayland (i dont know how it could do the latter so i > > > > > assume it launched in wl mode). > > > > > > > > > > > > > With modesetting+nvidia there is a new problem: restarting E > > > > > > > > with > > > > > > > > ctrl+alt+end does not work (switches to console mode). > > > > > > > > Suspend/resume is not involved in this, and there is no GPU > > > > > > > > error. > > > > > > > > > > > > > > I can't help a lot with nvidia - I gave up on them years ago > > > > > > > because > > > > > > > they didn't want to play ball with Wayland like everyone else and > > > > > > > frankly having their kernel driver keep breaking on kernel > > > > > > > upgrades > > > > > > > (kernel changes api/abi - nvidia driver can't build anymore and > > > > > > > i'm > > > > > > > forced to manually downgrade my kernel). I can say that all of my > > > > > > > machines run arch linux (except some of my arm devices - they are > > > > > > > special and mostly used as testbeds and not stable systems) and > > > > > > > they > > > > > > > all use either amd or intel graphics and suspend/resume works. > > > > > > > > > > > > Well, I originally wanted to buy an amd CPU+amd GPU laptop, but none > > > > > > of I found ticked all the boxes. Now I have amd CPU+nvidia GPU and > > > > > > an > > > > > > ugly shader error... :-/ > > > > > > > > > > well this is personal - but i'd just veto any choices that involve an > > > > > nvidia gpu. if nvidia drivers were all oss like amd - i wouldn't have > > > > > as much of an issue. i know it doesn't help you now, but maybe in > > > > > future choices. > > > > > > > > I agree with you, full amd would have been better. But unfortunately I > > > > was in a hurry, because on my old laptop the power selector chip died, > > > > and now the laptop can not be used from battery any more. So it became > > > > a desktop, and I needed mobility. > > > > > > Something to keep in mind for the future. :) At least nvidia now are > > > beginning to play nice Wayland-wise by supporting gbm, but I made my > > > decision already years ago and have been happy ever since. :) > > > > > > > > > After some googling, I found that it's possible to disable the > > > > > > nvidia > > > > > > GPU in nvidia-settings, and use amdgpu exclusively. I've tried this, > > > > > > and E+resume works like as it should! Unfortunately I have no > > > > > > externel > > > > > > monitor outputs in this mode, because only nvidia is wired to the > > > > > > hdmi/DP ports. Oh well. > > > > > > > > > > well wow.. so something to do with nvidia maybe optimus ... but... > > > > > hmmm. > > > > > but at least see if you can get a backtrace from e to see where it is > > > > > stuck > > > > > - if it is. that will tell me some information at least. > > > > > > > > I changed back nvidia-settings to use nvidia optimus mode (to generate > > > > a backtrace for you), but guess what, resuming works now!!! There is > > > > no shader error in dmesg. After looking around more closely, it seems > > > > I've changed the "Prime profiles" from "Intel (Power Saving Mode)" - > > > > [this was actually the amdgpu only mode, where E worked] to "NVIDIA On > > > > Demand" mode. There is a third option here which is "NVIDIA > > > > (Performance mode)" - this is the mode I was using before. So in > > > > Performance Mode, nvidia-smi shows that E has some parts which run on > > > > the nvidia GPU, but in "On Demand" mode E is run on amdgpu. And > > > > resuming works this way. > > > > > > > > The only problem I see is that after resuming the external monitor > > > > stays black, but xrandr thinks it is connected. I'm looking at this > > > > now. > > > > > > Well - you're getting somewhere. You seem to have stumbled on some > > > nvidia/optimus related driver bug. it seems maybe e just happens to > > > trigger > > > it by luck (or un-luck). This does happen - thing sonly get tested with > > > specific workloads. When a new workload appears, then it sometimes > > > triggers > > > different code paths that SHOULD work but have a bug and now the bug is > > > exposed. It requires people to then test, reproduce and then fix it. It > > > may > > > be deep in the nvidia blob. Maybe in the glue binding it to the amdgpu > > > driver with optimus. I don't know. I haven't seen this issue, but I have > > > known to keep away from anything optimus related as while there is, in > > > theory, some cool stuff here tech-wise, it's problematic and has a history > > > of problems. > > > > Just an update on the resume+external monitor stays black issue. It > > seems I can make the monitor work correctly by unplugging then > > re-plugging the hdmi cable. Unfortunately using xrandr only to try to > > fix the problem without touching the cable causes the usual nvidia > > shader exception, which randomly triggers a sigsegv in the X server. > > It's unpredictable as hell. > > oh... this seems like you definitely have deeper problems. e is just seemingly > good at finding/exposing them. > > > On the other hand the budgie wm (which seems to be based on mutter) > > has no problems with correctly resuming the external monitor. Looking > > at the source code of mutter I see some nvidia specific quirks, like > > NV_robustness_video_memory_purge. I'm going to try to hack this out of > > mutter and see whether it would fail. > > indeed efl (evas) has nothing like this, but if xorg is crashing.. then you > have > deeper issues that e can't really solve like with the above. :)
Finally, I was able to get rid of the shader errors. By adding the "NVreg_EnableS0ixPowerManagement=1" parameter to nvidia.ko E can finally suspend+resume without deadly problems. The external monitor is still not detected after resume, but reapplying the screen setup fixes that without segfaulting the X server. That's good enough for me. Thanks for your help+ideas, Laszlo _______________________________________________ enlightenment-users mailing list enlightenment-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-users