On Fri, 05 Aug 2005 17:01:01 -0400 Robert May <[EMAIL PROTECTED]>
babbled:
> Christoph,
>
> Naturally I compiled e17 and then restarted e. I went into tty2 logged
> in and typed:
> gdb
> file enlightenment
> it says it is loading symbols and ....done
> I then attach to the pid
> naturally alot of data spills across the screen but the information
> doesn't change even if there is a segfault (I typed bt during operation
> and at the segfault) I really didn't start having any segfaults until I
> added the second monitor (Dual head not Xinerama) so I feel I may be
> "jumping the gun" so to speak in sending debug reports. I just have no
> idea on the correct way to set everything up and submit a noteworthy bug
> report(Dual monitors or not). I have searched for hours and have been
> unable to find the right way to set this up. I have never really
> programed before (other than hello world) so my abilities aren't up to
> par with everyone else.
ok. here is what you need to do.
1. make sure you compil everything again - ecore, eet, evas, edje, embryo and e
itself.
before you do this though do:
export CFLAGS=-g
this will add -g to the compile flags to add debugging symbols.
now for each of the libraries and e do:
make clean distclean
./configure
make
make install
(or whatever variation on this you have - but remember make clean distclean
before re-running configure)
NOW everything has debugging symbols. good. NOW just run E. nothing special
here. WHEN/IF you get the "white box of death" that says E segfaulted go over to
a text console (ctrl+alt+f1) and log in. now you need to attach gdb to e. find
out the process ID of e.
ps -auwx | grep enlightenment
now type:
gdb enlightenment PID
where PID is the process id you found.
it will stream along for a bit then give you a prompt. you can NOW debug.
you should first use gdb's backtrace command:
bt
you will see something like:
(gdb) bt
#0 0xb7d539f8 in select () from /lib/tls/libc.so.6
#1 0xb7dff66a in _XEnq () from /usr/X11R6/lib/libX11.so.6
#2 0xb7dffa7e in _XRead () from /usr/X11R6/lib/libX11.so.6
#3 0xb7e01795 in _XReadEvents () from /usr/X11R6/lib/libX11.so.6
#4 0xb7defa88 in XNextEvent () from /usr/X11R6/lib/libX11.so.6
#5 0x0809b698 in e_alert_show (
text=0x80a34f0 "This is very bad. Enlightenment has segfaulted.\nThis is not
meant to happen and is likely a sign of a\nbug in Enlightenment or the libraries
it relies on.\n\nYou can gdb attach to this process now to try"...)
at e_alert.c:136
#6 0x0808f706 in e_sigseg_act (x=11, info=0x80a9fb0, data=0x80aa030)
at e_signals.c:54
#7 <signal handler called>
#8 0xb7d539f8 in select () from /lib/tls/libc.so.6
#9 0xb7f814ee in _ecore_main_select (timeout=0) at ecore_main.c:338
#10 0xb7f819ba in _ecore_main_loop_iterate_internal (once_only=0)
at ecore_main.c:575
#11 0xb7f81a2b in ecore_main_loop_begin () at ecore_main.c:79
#12 0x08059bb3 in main (argc=1, argv=0xbffff144) at e_main.c:551
this is a stack trace. it basically means the main() function called
ecore_main_loop_begin(), it called _ecore_main_loop_iterate_internal(), and this
function called _ecore_main_select(), and that in turn called select() etc.
the important bit here is that E has its own segfault handler - it traps its own
problems and tries to let you recover (that's what the white box of death is).
NOW see the function that was called:
#6 0x0808f706 in e_sigseg_act (x=11, info=0x80a9fb0, data=0x80aa030)
the e_sigseg_act() function is called when the program segfaults (it is called
directly by the kernel interrupting anything e was doing just before it was
called - the thing it was doing would have caused the segfault). so that means
in this example E segfaulted inside the select() function (frame 7 is an
intermediate frame that calls the signal handler).
now we need to get some more info on it. you want to GO to the stack frame just
before the segfault. in this case its stack frame 8. you want a listing of the
code there and some info (so we can double check your code there is what we have
here too). the gdb commands you then want are:
fr 8
l
fr 8 = go to frame 8
l = list the source code here.
NOW if you want to get adventurous you should start dumping variable values for
us. in this example i cant debug select because its in libc (i faked this segv
just for this mail). but i will try a different stack frame - the principle is
the same. i will look at frame 9.
(gdb) fr 9
#9 0xb7f814ee in _ecore_main_select (timeout=0) at ecore_main.c:338
338 ret = select(max_fd + 1, &rfds, &wfds, &exfds, t);
(gdb) l
333 }
334 }
335 #ifndef WIN32
336 if (_ecore_signal_count_get()) return -1;
337 #endif
338 ret = select(max_fd + 1, &rfds, &wfds, &exfds, t);
339 if (ret < 0)
340 {
341 if (errno == EINTR) return -1;
342 }
now i can see some variables there and function calls - often variables like
pointers may be garbage or NULL and thus causing a segv. lets check to see what
they are using the print (p) command:
(gdb) p ret
$1 = -4
(gdb) p rfds
$2 = {__fds_bits = {1280, 0 <repeats 31 times>}}
(gdb) p wfds
$3 = {__fds_bits = {0 <repeats 32 times>}}
(gdb) p exfds
$4 = {__fds_bits = {0 <repeats 32 times>}}
as you can see - it pretty easy.
one thing to note - IF the variable is a pointer to something printing it will
print the pointer value, not what it points TO what it points TO is important.
so to print that i suggest:
p *pointer
example:
(gdb) fr 5
#5 0x0809b698 in e_alert_show (
text=0x80a34f0 "This is very bad. Enlightenment has segfaulted.\nThis is not
meant to happen and is likely a sign of a\nbug in Enlightenment or the libraries
it relies on.\n\nYou can gdb attach to this process now to try"...)
at e_alert.c:136
136 XNextEvent(dd, &ev);
(gdb) l
131 XSync(dd, False);
132
133 button = 0;
134 for (; button == 0;)
135 {
136 XNextEvent(dd, &ev);
137 switch (ev.type)
138 {
139 case KeyPress:
140 key = XKeysymToKeycode(dd, XStringToKeysym("F1"));
(gdb) p dd
$5 = (Display *) 0x80d1018
aha! we know its a pointer (Display *) the * means its a pointer to a Display
struct/type... so...
(gdb) p *dd
$6 = <incomplete type>
well ok - not today. that's xlib's display struct. it's private and we don't
know what's inside - BUT all the types e uses inside that it defines will allow
you to do this generally.
anyway - spend some quality time with gdb and do all this - mail back all the
output of gdb during one of these "debugging sessions" and then we can sift
through it. it may not mean a lot to you, but it means a world to us
(generally). sometimes the stack is screwed and well - nothing you can do. often
this means you need to resort to valgrind to catch things before the stack gets
screwed. this gets a bit more intense, BUT you will need to run E under valgrind
- allowing gdb to attach.
valgrind --tool=memcheck --db-attach=yes enlightenment
this will mean you need a console to run it from and an xserver running for it
do display on. the console will need to be usable even if the wm is screwed (so
another machine sshing in, a text console etc.). valgrind will make things
VEEEEERRRRY SLOW. but it is thorough and finds shit. when you get a problem
valgrind will spew and then ask if you want to attach gdb. often you get a
harmless one of these once when you start e - about reading uninitialized memory
inside XPutImage - ignore this. its harmless. it will be this:
==7072== Syscall param writev(vector[...]) points to uninitialised byte(s)
==7072== at 0x1BC255E8: (within /lib/tls/libc-2.3.2.so)
==7072== by 0x1BAC66D6: (within /usr/X11R6/lib/libX11.so.6.2)
==7072== by 0x1BAC6986: _X11TransWritev (in /usr/X11R6/lib/libX11.so.6.2)
==7072== by 0x1BAAB03C: _XSend (in /usr/X11R6/lib/libX11.so.6.2)
==7072== by 0x1BA9EA6B: (within /usr/X11R6/lib/libX11.so.6.2)
==7072== by 0x1BA9F1D2: XPutImage (in /usr/X11R6/lib/libX11.so.6.2)
==7072== by 0x1B957459: evas_software_x11_x_output_buffer_paste
(evas_x_buffer.c:173)
==7072== by 0x1B955FEA: evas_software_x11_outbuf_flush (evas_outbuf.c:327)
==7072== by 0x1B953EB4: evas_engine_software_x11_output_flush
(evas_engine.c:417)
==7072== by 0x1B93A6A4: evas_render_updates (evas_render.c:298)
==7072== by 0x1B9A0960: _ecore_evas_x_render (ecore_evas_x.c:173)
==7072== by 0x1B9A1EF8: _ecore_evas_x_idle_enter (ecore_evas_x.c:825)
==7072== Address 0x1ED603FC is 596 bytes inside a block of size 38912 alloc'd
==7072== at 0x1B90459D: malloc (vg_replace_malloc.c:130)
==7072== by 0x1B957200: evas_software_x11_x_output_buffer_new
(evas_x_buffer.c:132)
==7072== by 0x1B955224: evas_software_x11_outbuf_new_region_for_update
(evas_outbuf.c:256)
==7072== by 0x1B953DDA:
evas_engine_software_x11_output_redraws_next_update_get (evas_engine.c:394)
==7072== by 0x1B93A355: evas_render_updates (evas_render.c:210)
==7072== by 0x1B9A0960: _ecore_evas_x_render (ecore_evas_x.c:173)
==7072== by 0x1B9A1EF8: _ecore_evas_x_idle_enter (ecore_evas_x.c:825)
==7072== by 0x1B9725E3: _ecore_idle_enterer_call (ecore_idle_enterer.c:78)
==7072== by 0x1B9746AE: _ecore_main_loop_iterate_internal (ecore_main.c:477)
==7072== by 0x1B974A2A: ecore_main_loop_begin (ecore_main.c:79)
==7072== by 0x8059BB2: main (e_main.c:551)
==7072==
==7072== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ----
just say no (n) and ignore that one - you may even get it 2 times for multihead.
but that one is harmless. anything else though is a likely candidate for a
problem - when it complains - say yes to attach and get us the valgrind AND gdb
info (debug in gdb as above). my valgrind complains a lot when e shuts down
about problems inside exit() - ingore them. the look like this:
==7072==
==7072== Invalid read of size 4
==7072== at 0x1BB6B16C: (within /lib/tls/libc-2.3.2.so)
==7072== by 0x1BB6B58C: (within /lib/tls/libc-2.3.2.so)
==7072== by 0x1BBE6FF6: (within /lib/tls/libc-2.3.2.so)
==7072== by 0x1BC61422: (within /lib/tls/libc-2.3.2.so)
==7072== by 0x1BC61337: (within /lib/tls/libc-2.3.2.so)
==7072== by 0x1BC616C4: __libc_freeres (in /lib/tls/libc-2.3.2.so)
==7072== by 0x1B8FEA08: _vgw(float, long double,...)(...)(long
double,...)(short) (vg_intercept.c:55)
==7072== by 0x1BB7F1C5: exit (in /lib/tls/libc-2.3.2.so)
==7072== by 0x1BB6997D: __libc_start_main (in /lib/tls/libc-2.3.2.so)
==7072== by 0x8058AE0: ??? (start.S:102)
==7072== Address 0x1C7BFD98 is 8 bytes inside a block of size 60 free'd
==7072== at 0x1B904B04: free (vg_replace_malloc.c:152)
==7072== by 0x1BB6BD37: (within /lib/tls/libc-2.3.2.so)
==7072== by 0x1BC2A902: (within /lib/tls/libc-2.3.2.so)
==7072== by 0x1BC2A7A6: tdestroy (in /lib/tls/libc-2.3.2.so)
==7072== by 0x1BC611C1: (within /lib/tls/libc-2.3.2.so)
==7072== by 0x1BC616C4: __libc_freeres (in /lib/tls/libc-2.3.2.so)
==7072== by 0x1B8FEA08: _vgw(float, long double,...)(...)(long
double,...)(short) (vg_intercept.c:55)
==7072== by 0x1BB7F1C5: exit (in /lib/tls/libc-2.3.2.so)
==7072== by 0x1BB6997D: __libc_start_main (in /lib/tls/libc-2.3.2.so)
==7072== by 0x8058AE0: ??? (start.S:102)
==7072==
==7072== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ----
they are valgrind's own internal debugging hooks causing problems.
as a last note. you MAY need to run valgrind from a console - many people ask
how they can do this and debug a wm. well here is one way.
1. make sure you have root access (i will assume sudo allows you root access for
now)
2. do:
sudo X -ac :1 &
<this will run an empty xserver on :1 and flip to it - flip back to your console
with ctrl+alt+f1 or where ever the console was>
export DISPLAY=:1
valgrind --tool=memcheck --db-attach=yes enlightenment
<no flip back to the new xserver - might be ctrl+alt+f8>
e will be running (very slowly) under valgrind. do whatever it is you do to make
the bug happen. when e "locks up" and doesnt seem to move (but the mouse does),
flip back to the text console where u ran valgrind from and see if it is
complaining (as per above).
anyway - this is a short guide to debugging for e - in fact for many debugging
things the same principles apply. if you do this stuff when you see problems
like segv's you can help us clear them up with the output of this work.
if someone wants to turn this into a "quick guide to debugging e" that'd be
great :)
> Thanks for your assistance,
>
> Robert
>
> On Fri, 2005-08-05 at 09:53 +0200, Christoph Gysin wrote:
> > Robert May wrote:
> > > I am wanting to gdb enlightenment but how would a newbie SuSE 9.3 user
> > > go about doing so? I have already recompiled with CFLAGS="-g" but
> > > running gdb from another tty and attaching to the process doesn't seem
> > > to be working right. Any help would be greatly appreciated...
> >
> > "not working right"? What did you do? What did gdb say?
> >
> > Please provide more information.
> >
> > Christoph
>
>
>
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> _______________________________________________
> enlightenment-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/enlightenment-users
>
--
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler) [EMAIL PROTECTED]
裸好多 [EMAIL PROTECTED]
Tokyo, Japan (東京 日本)
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
enlightenment-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-users