We have been hitting segfault while running monitoring script. The script does
not consistently hitting segfault. On some systems, it took couple of days,
more than a month on others. We are running ksh Version JM 93t+ 2010-06-21
with the follow 2 patches:
================cut here=======================
*** old/sh/xec.c Tue Jun 15 16:00:32 2010
--- new/sh/xec.c Tue Feb 28 13:46:45 2012
***************
*** 1406,1416 ****
job_lock();
while((parent = vfork()) < 0)
_sh_fork(parent, 0, (int*)0);
!
job_fork(parent);
if(parent)
{
job_clear();
job_post(parent,0);
job_wait(parent);
sh_iorestore(shp,topfd,SH_JMPCMD);
sh_done(shp,(shp->exitval&SH_EXITSIG)?(shp->exitval&SH_EXITMASK):0);
--- 1406,1418 ----
job_lock();
while((parent = vfork()) < 0)
_sh_fork(parent, 0, (int*)0);
! if(parent<=0)
! job_fork(parent);
if(parent)
{
job_clear();
job_post(parent,0);
+ job_fork(parent);
job_wait(parent);
sh_iorestore(shp,topfd,SH_JMPCMD);
sh_done(shp,(shp->exitval&SH_EXITSIG)?(shp->exitval&SH_EXITMASK):0);
--- src/cmd/ksh93/bltins/misc.c
+++ src/cmd/ksh93/bltins/misc.c 2011-02-22 13:03:35.783936889 +0000
@@ -273,7 +273,6 @@ int b_dot_cmd(register int n,char *ar
shp->st.self = &savst;
shp->topscope = (Shscope_t*)shp->st.self;
prevscope->save_tree = shp->var_tree;
- shp->st.cmdname = argv[0];
if(np)
shp->st.filename = np->nvalue.rp->fname;
nv_putval(SH_PATHNAMENOD, shp->st.filename ,NV_NOFREE);
===============
This is the only message in /var/log/mesages
Jan 21 08:04:20 ls21n35 kernel: [4198311.337110] mmnfsmonitor[8845]: segfault
at 7 ip 000000000041ffb7 sp 00007fff4bcacce0 error 4 in mmksh[400000+128000]
# gdb /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/core
GNU gdb (GDB) SUSE (7.0-0.4.16)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/lpp/mmfs/bin/mmksh...done.
warning: core file may not match specified executable file.
Missing separate debuginfo for /lib64/libdl.so.2
Try: zypper install -C
"debuginfo(build-id)=d70e9482ac22a826c1cf7d04bdbb1bf06f2e707b"
Missing separate debuginfo for /lib64/libm.so.6
Try: zypper install -C
"debuginfo(build-id)=365e4d2c812908177265c8223f222a1665fe1035"
Missing separate debuginfo for /lib64/libc.so.6
Try: zypper install -C
"debuginfo(build-id)=a41ac0b0b7cd60bd57473303c2c3de08856d2e06"
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
Try: zypper install -C
"debuginfo(build-id)=17c088070352d83e7afc43d83756b00899fd37f0"
Reading symbols from /lib64/libdl.so.2...Missing separate debuginfo for
/lib64/libdl.so.2
Try: zypper install -C
"debuginfo(build-id)=d70e9482ac22a826c1cf7d04bdbb1bf06f2e707b"
(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...Missing separate debuginfo for
/lib64/libm.so.6
Try: zypper install -C
"debuginfo(build-id)=365e4d2c812908177265c8223f222a1665fe1035"
(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...Missing separate debuginfo for
/lib64/libc.so.6
Try: zypper install -C
"debuginfo(build-id)=a41ac0b0b7cd60bd57473303c2c3de08856d2e06"
(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Missing separate debuginfo
for /lib64/ld-linux-x86-64.so.2
Try: zypper install -C
"debuginfo(build-id)=17c088070352d83e7afc43d83756b00899fd37f0"
(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `/usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmnfsmonitor
-s'.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000041ffb7 in job_chksave ()
(gdb) where
#0 0x000000000041ffb7 in job_chksave ()
#1 0x0000000000a18460 in ?? ()
#2 0x000000000042006b in jobsave_create ()
#3 0x0000000000000000 in ?? ()
(gdb) bt
#0 0x000000000041ffb7 in job_chksave ()
#1 0x0000000000a18460 in ?? ()
#2 0x000000000042006b in jobsave_create ()
#3 0x0000000000000000 in ?? ()
(gdb) l
1 init.c: No such file or directory.
in init.c
# uname -a
Linux ls21n35 2.6.32.36-0.5-default #1 SMP 2011-04-14 10:12:31 +0200 x86_64
x86_64 x86_64 GNU/Linux
Is this a known problem? Please let me know if more info is needed.
Thanks,
Tru.
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers