I have a user with a ksh crashing problem, and that has
some "Write error: No space left on device" messages
in /var/log/messages.

After some debugging, and creating a chroot on a file
disk image, and a test user, and slowly filling the
"on file" filesystem, e.g.

dd if=/dev/zero of=/mnt/tmp/zerosN bs=1M count=1024
dd if=/dev/zero of=/mnt/tmp/zerosN bs=1K count=2

until leaving just around 12K, I managed to reproduce the
problem, and be able to debug it with valgrind and vgdb;
debugging on these conditions is tricky, as cannot tell
valgrind to spawn gdb, because then gdb itself would fail
to start.

So, after following the code enough, I learned that at places
it handles SH_JMPEXIT, there was almost non existing
handling of SH_JMPERREXIT.

ksh would evently cause a crash due to the struct
subshell allocated on stack, in sh/subshell.c:sh_subshell
kept set to the global subshell_data, after it siglongjmp
back the stack due to, not fully handling the out of disk
space errors. It would print a few messages, everytime
a pipe was created, e.g.:

/etc/profile: line 28: write to 3 failed [No space left on device]

until eventually crashing due to corrupted memory; e.g. the
references to stack data from sh_subsell in the global
subshell_data. One strange thing to me in coredump analysis
was that subshell_data prev field was pointing to itself when
it eventually crashed, what later was understood and expected...

The attached patch handles SH_JMPERREXIT in the code
paths SH_JMPEXIT is handled, and the failed login, on
full disk, ends in a pause() call:

---terminal 1---
$ valgrind -q --leak-check=full --free-fill=0x5a --vgdb=full
--vgdb-error=0 /bin/ksh -l
==17730== (action at startup) vgdb me ...
==17730==
==17730== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==17730==   /path/to/gdb /bin/ksh
==17730== and then give GDB the following command
==17730==   target remote | /usr/lib64/valgrind/../../bin/vgdb --pid=17730
==17730== --pid is optional if only one valgrind process is running
==17730==
==17730== Syscall param mount(type) points to unaddressable byte(s)
==17730==    at 0x563377A: mount (in /usr/lib64/libc-2.17.so)
==17730==    by 0x493E58: fs3d_mount (fs3d.c:115)
==17730==    by 0x493C8B: fs3d (fs3d.c:57)
==17730==    by 0x423E41: sh_init (init.c:1302)
==17730==    by 0x405CD3: sh_main (main.c:141)
==17730==    by 0x405B84: main (pmain.c:45)
==17730==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==17730==
==17730== (action on error) vgdb me ...
==17730== Continuing ...
/etc/profile: line 28: write to 3 failed [No space left on device]
---8<---

---terminal 2---
(gdb) c
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
(gdb) bt
#0  0x00000000055fa470 in __pause_nocancel () from /lib64/libc.so.6
#1  0x000000000041e73d in sh_done (ptr=0x793360 <sh>, sig=255) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/fault.c:665
#2  0x0000000000407407 in exfile (shp=0x4542, iop=0xff, fno=0) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:604
#3  0x0000000000405c43 in sh_source (shp=0x793360 <sh>, iop=0x0,
file=0x524804 <e_sysprofile> "/etc/profile")
    at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:109
#4  0x00000000004060e4 in sh_main (ac=2, av=0xfff000498, userinit=0x0)
at /home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/main.c:202
#5  0x0000000000405b85 in main (argc=2, argv=0xfff000498) at
/home/pcpa/rhel/ksh/ksh-20120801/src/cmd/ksh93/sh/pmain.c:45
(gdb)
---8<---

Thanks,
Paulo
diff -up ksh-20120801/src/cmd/ksh93/sh/main.c.orig ksh-20120801/src/cmd/ksh93/sh/main.c
--- ksh-20120801/src/cmd/ksh93/sh/main.c.orig	2015-04-17 16:55:57.802048900 -0300
+++ ksh-20120801/src/cmd/ksh93/sh/main.c	2015-04-17 17:10:45.276129709 -0300
@@ -423,7 +423,7 @@ static void	exfile(register Shell_t *shp
 		sfsync(shp->outpool);
 		shp->st.execbrk = shp->st.breakcnt = 0;
 		/* check for return from profile or env file */
-		if(sh_isstate(SH_PROFILE) && (jmpval==SH_JMPFUN || jmpval==SH_JMPEXIT))
+		if(sh_isstate(SH_PROFILE) && (jmpval==SH_JMPFUN || jmpval==SH_JMPEXIT || jmpval==SH_JMPERREXIT))
 		{
 			sh_setstate(states);
 			goto done;
@@ -600,6 +600,8 @@ done:
 		siglongjmp(*shp->jmplist,jmpval);
 	else if(jmpval == SH_JMPEXIT)
 		sh_done(shp,0);
+	else if(jmpval == SH_JMPERREXIT)
+		sh_done(shp,-1);
 	if(fno>0)
 		sh_close(fno);
 	if(shp->st.filename)
_______________________________________________
ast-users mailing list
ast-users@lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-users

Reply via email to