I have a user with a ksh crashing problem, and that has some "Write error: No space left on device" messages in /var/log/messages.
After some debugging, and creating a chroot on a file disk image, and a test user, and slowly filling the "on file" filesystem, e.g. dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024 dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2 until leaving just around 12K, I managed to reproduce the problem, and be able to debug it with valgrind and vgdb; debugging on these conditions is tricky, as cannot tell valgrind to spawn gdb, because then gdb itself would fail to start. So, after following the code enough, I learned that at places it handles SH_JMPEXIT, there was almost non existing handling of SH_JMPERREXIT. ksh would evently cause a crash due to the struct subshell allocated on stack, in sh/subshell.c:sh_subshell kept set to the global subshell_data, after it siglongjmp back the stack due to, not fully handling the out of disk space errors. It would print a few messages, everytime a pipe was created, e.g.: /etc/profile: line 28: write to 3 failed [No space left on device] until eventually crashing due to corrupted memory; e.g. the references to stack data from sh_subsell in the global subshell_data. One strange thing to me in coredump analysis was that subshell_data prev field was pointing to itself when it eventually crashed, what later was understood and expected... The attached patch handles SH_JMPERREXIT in the code paths SH_JMPEXIT is handled, and the failed login, on full disk, ends in a pause() call: ---terminal 1--- $ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full --vgdb-error=0 /bin/ksh -l ==17730== (action at startup) vgdb me ... ==17730== ==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this ==17730== /path/to/gdb /bin/ksh ==17730== and then give GDB the following command ==17730== target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730 ==17730== --pid is optional if only one valgrind process is running ==17730== ==17730== Syscall param mount(type) points to unaddressable byte(s) ==17730== at 0x563377A: mount (in /usr/lib64/libc-2.17.so) ==17730== by 0x493E58: fs3d_mount (fs3d.c:115) ==17730== by 0x493C8B: fs3d (fs3d.c:57) ==17730== by 0x423E41: sh_init (init.c:1302) ==17730== by 0x405CD3: sh_main (main.c:141) ==17730== by 0x405B84: main (pmain.c:45) ==17730== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==17730== ==17730== (action on error) vgdb me ... ==17730== Continuing ... /etc/profile: line 28: write to 3 failed [No space left on device] ---8<--- ---terminal 2--- (gdb) c Continuing. ^C Program received signal SIGTRAP, Trace/breakpoint trap. 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6 (gdb) bt #0 0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6 #1 0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665 #2 0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604 #3 0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0, file=0x524804 <e_sysprofile> "/etc/profile") at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109 #4 0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0) at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202 #5 0x0000000000405b85 in main (argc=2, argv=0xfff000498) at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45 (gdb) ---8<--- Thanks, Paulo
diff -up ksh-20120801/src/cmd/ksh93/sh/main.c.orig ksh-20120801/src/cmd/ksh93/sh/main.c --- ksh-20120801/src/cmd/ksh93/sh/main.c.orig 2015-04-17 16:55:57.802048900 -0300 +++ ksh-20120801/src/cmd/ksh93/sh/main.c 2015-04-17 17:10:45.276129709 -0300 @@ -423,7 +423,7 @@ static void exfile(register Shell_t *shp sfsync(shp->outpool); shp->st.execbrk = shp->st.breakcnt = 0; /* check for return from profile or env file */ - if(sh_isstate(SH_PROFILE) && (jmpval==SH_JMPFUN || jmpval==SH_JMPEXIT)) + if(sh_isstate(SH_PROFILE) && (jmpval==SH_JMPFUN || jmpval==SH_JMPEXIT || jmpval==SH_JMPERREXIT)) { sh_setstate(states); goto done; @@ -600,6 +600,8 @@ done: siglongjmp(*shp->jmplist,jmpval); else if(jmpval == SH_JMPEXIT) sh_done(shp,0); + else if(jmpval == SH_JMPERREXIT) + sh_done(shp,-1); if(fno>0) sh_close(fno); if(shp->st.filename)
_______________________________________________ ast-users mailing list ast-users@lists.research.att.com http://lists.research.att.com/mailman/listinfo/ast-users