On Wed, 26 Oct 2005, Tobias Ulmer wrote:

> Hi
> 
> I'm running a 3.7 (all patches applied, everthing else default) on an
> old box (dmesg at the end). It fetches mail for me with the following
> script:
> 
> ---8<---
> #! /bin/sh
> 
> LOCK="$HOME/.getmail.lock"
> 
> if ! [ -f $LOCK ]
> then
>         touch $LOCK
>         getmail 2>&1 > /dev/null
>         rm $LOCK
> fi
> ---8<---
> 
> This script is run from crontab every minute. Sometimes ksh segfaults
> and dumps core. It only happens once a day or two, so this is not a big
> problem for me. I was however curious and compiled ksh with -g to get
> more information.
> 
> [EMAIL PROTECTED]:~# gdb /bin/sh /home/tobiasu/core/sh.core
> GNU gdb 6.3
> [...]
> This GDB was configured as "i386-unknown-openbsd3.7"...
> Core was generated by `sh'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x1c027ed6 in _weak__thread_fd_unlock ()
> (gdb) backtrace full
> #0  0x1c027ed6 in _weak__thread_fd_unlock ()
> No symbol table info available.
> #1  0x1c028025 in _weak__thread_fd_unlock ()
> No symbol table info available.
> #2  0x1c027b48 in _weak__thread_fd_unlock ()
> No symbol table info available.
> #3  0x1c028095 in _weak__thread_fd_unlock ()
> No symbol table info available.
> #4  0x1c028395 in malloc ()
> No symbol table info available.
> #5  0x1c03c90e in atexit ()
> No symbol table info available.
> #6  0x1c0002e9 in __register_frame_info ()
> No symbol table info available.
> #7  0x1c000155 in __init ()
> No symbol table info available.
> #8  0x1c0001ee in ___start ()
> No symbol table info available.
> #9  0x1c00016f in _start ()
> No symbol table info available.
> (gdb) quit
> [EMAIL PROTECTED]:~# gdb /bin/sh /home/tobiasu/core/sh2.core
> GNU gdb 6.3
> [...]
> This GDB was configured as "i386-unknown-openbsd3.7"...
> Core was generated by `sh'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x1c027ed6 in _weak__thread_fd_unlock ()
> (gdb) backtrace full
> #0  0x1c027ed6 in _weak__thread_fd_unlock ()
> No symbol table info available.
> #1  0x1c028025 in _weak__thread_fd_unlock ()
> No symbol table info available.
> #2  0x1c027b48 in _weak__thread_fd_unlock ()
> No symbol table info available.
> #3  0x1c028095 in _weak__thread_fd_unlock ()
> No symbol table info available.
> #4  0x1c028395 in malloc ()
> No symbol table info available.
> #5  0x1c03c90e in atexit ()
> No symbol table info available.
> #6  0x1c0002e9 in __register_frame_info ()
> No symbol table info available.
> #7  0x1c000155 in __init ()
> No symbol table info available.
> #8  0x1c0001ee in ___start ()
> No symbol table info available.
> #9  0x1c00016f in _start ()
> No symbol table info available.
> (gdb) info registers
> eax            0x0      0
> ecx            0x5      5
> edx            0x0      0
> ebx            0x0      0
> esp            0xcfbf3fd4       0xcfbf3fd4
> ebp            0xcfbf3fec       0xcfbf3fec
> esi            0x0      0
> edi            0xcfbf4034       -809549772
> eip            0x1c027ed6       0x1c027ed6
> eflags         0x10202  66050
> cs             0x1f     31
> ss             0x27     39
> ds             0x27     39
> es             0x27     39
> fs             0x27     39
> gs             0x27     39
> 
> 
> 
> My _guess_ is that it has something to do with the test condition if the
> lock-file still exists and then is deleted shortly after (This is called
> a race condition, right?). I tried to grep /usr/src but it takes hours
> (PIO4, no DMA...) and I didn't find out where this thread_fd_unlock
> function is nor what it does.

This is strange. From the trace it looks like you are crashing in code
that is executed before sh is running. What is extra strange is that
your code is executing thread specific stuff, which isn't supposed to
happen in a single threaded program like sh is. 


> I might also be completly wrong. Can someone bring some light into this
> and give me a clue why it happens? Maybe it can even be fixed :)

No clues so far...

        -Otto

Reply via email to