On Fri, Sep 10, 2021 at 12:36:24AM +0100, Carsten Haitzler wrote: > On Fri, 10 Sep 2021 08:28:30 +0900 Florian Schaefer <list...@netego.de> said: > > > On Thu, Sep 09, 2021 at 08:32:47AM +0100, Carsten Haitzler wrote: > > > On Thu, 9 Sep 2021 09:20:28 +0900 Florian Schaefer <list...@netego.de> > > > said: > > > > > > > On Wed, Sep 08, 2021 at 11:08:00AM +0100, Carsten Haitzler wrote: > > > > > On Wed, 8 Sep 2021 17:35:12 +0900 Florian Schaefer <list...@netego.de> > > > > > said: > > > > > > > > > > > Seems to me to have been good last words this time. ;) So I am > > > > > > running > > > > > > this all day now and I think I did not have a segfault due to > > > > > > procstat > > > > > > so far. Thanks for the fixes and I like the new indicator icon. :) > > > > > > > > > > > > That being said, I still had some crashes today and I am thinking > > > > > > that > > > > > > perhaps finally I might have something true to the topic of this > > > > > > thread. At least it crashes within libnvidia and I do not get an > > > > > > ASAN > > > > > > trace. > > > > > > > > > > > > For what it's worth, I tried to record a trace as good as I can. > > > > > > > > > > > > https://pastebin.com/p41b7GKW > > > > > > > > > > > > This happens reproducibly when I change from X running E to the text > > > > > > console and then back to the graphics screen. (I did quite a lot of > > > > > > these switches lately for running gdb while E is stil crashed.) > > > > > > When I > > > > > > have an "empty" E running it is fine. However, as soon as some > > > > > > window > > > > > > is open it reliably segfaults upon returning to X. Any ideas? > > > > > > > > > > time to stop asan and use valgrind. that can at least say if the > > > > > memory > > > > > nvidia is accessing is beyond some array e provided - the shader flush > > > > > basically has e provide a block of mem containing vertexes etc. for > > > > > the > > > > > gpu to draw. this array is expanded as new triangle are added then > > > > > flushed to the gpu at some point during rendering. that might be the > > > > > only thing i can think of that might be an efl bug - we use a dud > > > > > pointer? but then you could figure this out from valgrind + gdb... > > > > > maybe. valgrind would see the errant pointer and perhaps if its just > > > > > beyond some other block of mem or if that block was freed recently > > > > > etc. > > > > > > > > So there are things that valgrind can that asan cannot. More stuff to > > > > learn. :) > > > > > > Yeah. Valgrind is actually a cpu interpreter. it literally interprets > > > every > > > instruction and while doing that tracks memory state. it also traps > > > malloc/free and so on too and tracks what memory has been allocated, freed > > > down to the byte, if it has been written to or not etc. - doing qll of > > > this > > > is can see every issue. it may have no DEBUG to tell you more than "code > > > in > > > this library causers problem X", or with full gdb debug it can use that > > > memory address to tell you the file, line number, function name and so on > > > too. This is why valgrind is slow. it's literally interpreting everything > > > a > > > process under valgrind does. > > > > > > Asan has the compiler do the above instead. So when the compiler generates > > > the binary code for an application or library, it ADDS code that runs > > > natively that does tracking. This means tat simple instructions that just > > > do add/sub/compare etc. just get generated as normal. instructions that > > > access memory get tracking code added like valgrind. this means only the > > > code that the compiler generates will get tracked (e.g. efl and > > > enlightenment), and other code that efl calls (stuff in libc, libjpeg, > > > opengl libs etc.) will not be. this is a major difference in design and > > > makes asan massively faster. it's actually usable day to day on a decently > > > fast machine. it does mean e uses a lot more memory as asan needs extra > > > memory in the process to do the tracking of every byte and its history and > > > it does need to execute more instructions whenever reading/writing to some > > > memory etc. ... but not all the code your cpu runs will have this extra > > > work because it's only these actions and any libraries called that do not > > > have asan build will also not do this extra work. thus - asan can't find > > > anything in a library you did not build with asan support. thus sometimes > > > you still have to pull out ye-olde valgrind. valgrind is an amazing tool. > > > it's just slow. if you seem to have issues in e/efl the first port of call > > > is to try asan. it's fast enough to run day to day and not very intrusive > > > in that you can rebuild efl+e and then just ctrl+alt+end to restart e and > > > presto - asan is on. as long as you have pre set-up a proper ASAN_OPTIONS > > > env var ... also i suggest you: > > > > > > export EINA_FREEQ_TOTAL_MAX=0 > > > export EINA_FREEQ_MEM_MAX=0 > > > export EINA_FREEQ_FILL_MAX=0 > > > > > > as well. this may make e/efl a little more crashy and will also remove a > > > minor optimization (freeq is a ... free queue - it takes things that need > > > to be freed and adds them to a queue to free some time later = freeq will > > > collect things to free up until some limit. it will, when items are added > > > to the queue, fill their memory with some pattern like 0x555555 or > > > 0x777777 > > > etc. - or well up to the first N bytes of that memory object, and then > > > when > > > it actually does the free later will check that that pattern still is > > > there. if it's not, something wrote to that memory that SHOULD have been > > > left alone as the object was queued to be freed - it can give you an > > > indication that something is wrong but not exactly where). as freeq waits > > > until the app is idle (has nothing to do but wait for input or things to > > > happen) it runs through the queue then freeing objects so avoiding the > > > work > > > of the free until then. it's an efl self-check mechanism put in to hunt > > > down bugs and get a little optimzation in return for the extra work it has > > > to do. by setting the above to zero you basically disable freeq and force > > > it to free immediately which is what you want for both valgrind and asan > > > so > > > they detect the problems right. note efl knows when it runs under valgrind > > > and auto disables freeq on its own. but with asan, it does not. > > > > > > i hope that helps explain the above (roughly - i glossed over a lot of > > > details to make it easier to explain in a short amount of time) > > > > Ahm, yeah, thanks for the explanations. I wasn't expecting such a ... > > verbose ... reply. But it is appreciated. Even though I did probably not > > fully understand everything I now see that valgrind is more than meets > > the eye and that the same is true for eina. ;) > > > > > > Anyway, I tried to follow the debugging instructions on E.org as good as > > > > I can (after having finally recompiled everything without asan, but > > > > leaving the debugging symbols in place). > > > > > > > > Three observations: > > > > > > > > 1. The valgrind option --db-attach seems to be deprecated since 2015 and > > > > is not avaiable any more. So I just omitted this. I hope that's fine. > > > > > > i know. :( you now need a separate shell running gdb to attach gdb to the > > > process then tell it to run. painful. :( > > > > > > > 2. Then I tried to use the ".xinitrc-debug" method. Upon starting E the > > > > startup apparently went into an infinite loop, generating pages and > > > > pages of valgrind and E startup messages (a few valgrind messages with > > > > something-something exiting 0) and generating many 120MB core dumps. So > > > > I never got to the point where I would actually get anything but a black > > > > screen from X. > > > > > > aaah with valgrind you want to probably bypass enlightenment_start - this > > > means any issue will drop you out of your login session but you will have > > > a > > > chance to debug it. to avoid enlightenment_start do: > > > > > > export E_START=1 > > > valgrind --tool=memcheck ... enlightenment > > > > > > > > > FYI when i valgrind i do: > > > > > > valgrind --suppressions=$HOME/.zsh/vgd.supp --tool=memcheck > > > --num-callers=64 > > > --show-reachable=no --read-var-info=yes --leak-check=yes > > > --leak-resolution=high > > > --undef-value-errors=yes --track-origins=yes --vgdb-error=0 --vgdb=full > > > --redzone-size=512 --freelist-vol=100000000 > > > > > > :) the suppressions file is a file i keep to tell valgrind to ignore that > > > issue > > > - e.g. it's a common optimization in libc or freetype or something that it > > > should just pretend is not an issue. you can drop that option because you > > > won't maintain that file and that file is highly system specific. > > > > Hmm, this valgrind stuff is more difficult then I expected. First I was > > struggling to get the X server and enlightenment to start properly. I > > finally settled on just creating the .xinitrc and let the rest be sorted > > out with startx. > > > > But then, again, if I just start enlightenment without valgrind it > > works. With valgrind enabled everything stops at a black screen and the > > only way to get a responsive interface again is to reboot the machine. > > > > So here's what I do: https://pastebin.com/yzhy4gj1 > > > > The first part shows my .xinitrc. At the end you see two alternative > > exec commands. The one with valgrind causes everything to hang. The one > > without works just fine. > > > > Even though with valgrind enabled I cannot really do anything at least > > there is still heaps of stuff in the logfile, so that output is also > > included. Many "lost bytes" (not really dangerous, right?) and an > > unhandled instruction in e_comp_x_randr.c. Hmmm. > > unhanded instruction. that means your compiler is outputting instructions > valgrind does not know how to interpret. e.g. it is optimizing for a newer x86 > instruction. you might want to compile with -mpentium in CFLAGS or something > very conservative. you also might want to avoid --trace-children=yes if you > are > running enlightenment directly (avoiding enlightenment_start).
OK, thanks for the additional suggestions. And another recompile... ;) Let's see whether this and omitting the --trace-children makes a difference. I don't know whether I will manage to do this today but I will let you know the results when I have something. Cheers Florian _______________________________________________ enlightenment-users mailing list enlightenment-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-users