Jen, I was unable to reproduce the segmentation fault in Maui patch 13 with Gold. However, I did see at least one problem that could be related which has been fixed in the more recent snapshots. We have rolled these fixes into a new Maui patch release 14 which I have tested and have not seen the problem in.
Would you be able to download the latest Maui (patch 14) and see if it resolves your seg fault issue? Thank you, Scott On Fri, 2005-12-16 at 15:35 -0500, Aquarijen wrote: > Hi Dave, > Ok. > Not 100% sure I did this right and not sure where you want a break, etc, but: > > [EMAIL PROTECTED] sbin]# gdb ./maui > GNU gdb Red Hat Linux (6.3.0.0-0.31rh) > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu"...Using host > libthread_db library "/lib64/tls/libthread_db.so.1". > > (gdb) r > Starting program: /opt/maui/sbin/maui > Detaching after fork from child process 13618. > > Program received signal SIGSEGV, Segmentation fault. > 0x00000000004eec0d in __MSecSHA1Transform (state=0x9429bfd8bc2ffff8, > buffer=0x4e7c2621cf7c26fd <Address 0x4e7c2621cf7c26fd out of bounds>) > at MSec.c:831 > 831 state[0] += a; > (gdb) > > > Thanks!! > Jen > > Unix Admin, ORNL Institutional Cluster > Oak Ridge National Lab > > > On 12/16/05, Dave Jackson <[EMAIL PROTECTED]> wrote: > > Jen, > > > > Can you run Maui under gdb. (See section 14.1.4 of the online docs) > > > > When the failure occurs, please issue 'where' and send us the output. > > We will also attempt to reproduce this locally. > > > > Dave > > > > On Fri, 2005-12-16 at 14:51 -0500, Aquarijen wrote: > > > Hi All, > > > > > > I am not sure if this is a gold question or a maui question - so I am > > > posting to both - I hope that is ok... > > > Sorry for so many questions lately! So, I made sure that no users on > > > the test cluster have usernames begining with a number. I have gold > > > running and I have accounts, projects, machines and users set up with > > > 100000000 deposited to each gold account. > > > If I configure maui to use gold as its AM, maui pretty much instantly > > > dies. I am using maui 3.2.6p13 and gold version 2.0.0.4. I cleared > > > out the checkpoint file. I shut everything down and cleared the > > > queue. I then started gold, then maui, then pbs_server and then the > > > pbs_moms. Maui dies. I've tried this in different orders, too. Maui > > > dies if I have the AMCFG line included. > > > > > > Here is my simple maui.cfg: > > > > > > # maui.cfg 3.2 > > > SERVERHOST b05l02 > > > ADMIN1 root tippensjl > > > RMCFG[base] TYPE=PBS > > > JOBAGGREGATIONTIME 00:00:10 > > > RMPOLLINTERVAL 00:00:30 > > > DOWNNODEDELAYTIME 72:00:00 > > > SERVERPORT 42559 > > > SERVERMODE NORMAL > > > LOGFILE maui.log > > > LOGFILEMAXSIZE 100000000 > > > LOGLEVEL 9 > > > QUEUETIMEWEIGHT[0] 10 > > > FSPOLICY DEDICATEDPS > > > FSDEPTH 7 > > > FSINTERVAL 24:00:00 > > > FSWEIGHT 1 > > > FSDECAY 0.80 > > > BACKFILLPOLICY ON > > > BACKFILLTYPE BESTFIT > > > RESERVATIONPOLICY CURRENTHIGHEST > > > NODEACCESSPOLICY SHARED > > > JOBMAXSTARTTIME 2:00:00 > > > JOBMAXOVERRUN 0:30:00 > > > AMCFG[bank] TYPE=GOLD HOST=b05l02 PORT=7112 SOCKETPROTOCOL=HTTP > > > WIRE-PROTOCOL=XML CHARGEPOLICY=DEBITALLWC JOBFAILUREACTION=NONE > > > FLUSHINTERVAL=12:00:00 TIMEOUT=15 > > > > > > And here is my maui-private.cfg: > > > CLIENTCFG[AM:bank] CSKEY=sss CSALGO=HMAC > > > > > > And here is the last little bit of my maui.log. I have loglevel turned > > > up to 9. > > > > > > 12/16 14:32:42 MUserAdd(UName,UP) > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MCPRestore(USER,tippensjl,Optr) > > > 12/16 14:32:42 INFO: no checkpoint entry for object 'USER > > > tippensjl ' > > > 12/16 14:32:42 INFO: user tippensjl added > > > 12/16 14:32:42 INFO: PBS attribute 'job_state' value: 'Q' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'queue' value: 'workq' (r: NULL) > > > 12/16 14:32:42 MReqSetAttr(44,RQ,ReqClass,Value,1,2) > > > 12/16 14:32:42 INFO: job flags for job 44: 0 > > > 12/16 14:32:42 MJobSetAttr(44,GAttr,Value,1,5) > > > 12/16 14:32:42 MUMAGetBM(JFeature,PREEMPTEE,3) > > > 12/16 14:32:42 INFO: attribute 'PREEMPTEE' cleared for job 44 > > > 12/16 14:32:42 MJobGetPAL(44,RPAL,PAL,NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'server' value: 'b05l02' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Checkpoint' value: 'u' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'ctime' value: '1134761206' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Error_Path' value: > > > 'b05l02:/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111/jen-b5.e44' > > > (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Hold_Types' value: 'n' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Join_Path' value: 'n' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Keep_Files' value: 'n' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Mail_Points' value: 'ae' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Mail_Users' value: > > > '[EMAIL PROTECTED]' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'mtime' value: '1134761206' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Output_Path' value: > > > 'b05l02:/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111/jen-b5.o44' > > > (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Priority' value: '0' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'qtime' value: '1134761206' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Rerunable' value: 'True' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: > > > '10000:00:00' (r: cput) > > > 12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: '1' (r: > > > ncpus) > > > 12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: > > > '30:ppn=2' (r: neednodes) > > > 12/16 14:32:42 __MPBSGetTaskList(44,30:ppn=2,NULL,0) > > > 12/16 14:32:42 MReqSetAttr(44,RQ,ReqNodeFeature,Value,1,2) > > > 12/16 14:32:42 INFO: 0 host task(s) located for job > > > 12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: '30' > > > (r: nodect)12/16 14:32:42 INFO: PBS attribute 'Resource_List' > > > value: '30:ppn=2' (r: nodes) > > > 12/16 14:32:42 INFO: processing node request line '30:ppn=2' > > > 12/16 14:32:42 __MPBSGetTaskList(44,30:ppn=2,NULL,0) > > > 12/16 14:32:42 MReqSetAttr(44,RQ,ReqNodeFeature,Value,1,2) > > > 12/16 14:32:42 INFO: 0 host task(s) located for job > > > 12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: > > > '10000:00:00' (r: walltime) > > > 12/16 14:32:42 INFO: PBS attribute 'Shell_Path_List' value: > > > '/bin/bash' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'substate' value: '10' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Variable_List' value: > > > 'PBS_O_HOME=/home/2vt,PBS_O_LANG=en_US.UTF-8,PBS_O_LOGNAME=tippensjl,PBS_O_PATH=/opt/intel/cce/9.0/bin:/opt/intel/fce/9.0/bin:/usr/kerberos/bin:/opt/mpich-ch_p4-icc-1.2.7/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/env-switcher/bin:/opt/kernel_picker/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/home/2vt/bin,PBS_O_MAIL=/var/spool/mail/tippensjl,PBS_O_SHELL=/bin/bash,PBS_O_HOST=b05l02,PBS_O_WORKDIR=/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111,MODULE_VERSION_STACK=3.1.6,MANPATH=/opt/intel/cce/9.0/man:/opt/intel/fce/9.0/man:/opt/mpich-ch_p4-icc-1.2.7/man:/opt/modules/default/man:/usr/share/man:/usr/man:/usr/local/share/man:/usr/local/man:/usr/X11R6/man:/opt/pbs/man:/opt/env-switcher/man:/opt/kernel_picker/man:/opt/pvm3/man,HOSTNAME=b05l02,PVM_RSH=ssh,_MODULESBEGINENV_=/home/2vt/.modulesbeginenv,SHELL=/bin/bash,TERM=xterm,HISTSIZE=1000,TMPDIR=/home/2vt/.tmpdir,MODULE_SHELL=sh,OLDPWD=/home/2vt,MODULE_OSCAR_USER=tippensjl,USER=tippensjl,LD_LIBRARY_PATH=/opt/intel/mkl72/lib/em64t:/opt/intel/cce/9.0/lib:/opt/intel/fce/9.0/lib,LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:,ENV=/home/2vt/.bashrc,OSCAR_HOME=/opt/oscar,PVM_ROOT=/opt/pvm3,PVM_ARCH=LINUX,MODULE_VERSION=3.1.6,MAIL=/var/spool/mail/tippensjl,PATH=/opt/intel/cce/9.0/bin:/opt/intel/fce/9.0/bin:/usr/kerberos/bin:/opt/mpich-ch_p4-icc-1.2.7/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/env-switcher/bin:/opt/kernel_picker/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/home/2vt/bin,INPUTRC=/etc/inputrc,PWD=/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111,_LMFILES_=/opt/modules/oscar-modulefiles/default-manpath/1.0.1:/opt/modules/oscar-modulefiles/torque/1.2.0p5:/opt/env-switcher/share/env-switcher/mpi/mpich-ch_p4-icc-1.2.7:/opt/modules/oscar-modulefiles/switcher/1.0.13:/opt/modules/oscar-modulefiles/kernel_picker/1.4.1.3:/opt/modules/oscar-modulefiles/pvm/3.4.5+4:/opt/modules/modulefiles/oscar-modules/1.0.5:/opt/modules/modulefiles/iforte/9.0:/opt/modules/modulefiles/icce/9.0:/opt/modules/modulefiles/mkl-em64t/7.2,LANG=en_US.UTF-8,MODULEPATH=/opt/env-switcher/share/env-switcher:/opt/modules/oscar-modulefiles:/opt/modules/version:/opt/modules/$MODULE_VERSION/modulefiles:/opt/modules/modulefiles:,LOADEDMODULES=default-manpath/1.0.1:torque/1.2.0p5:mpi/mpich-ch_p4-icc-1.2.7:switcher/1.0.13:kernel_picker/1.4.1.3:pvm/3.4.5+4:oscar-modules/1.0.5:iforte/9.0:icce/9.0:mkl-em64t/7.2,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass,SHLVL=1,HOME=/home/2vt,LOGNAME=tippensjl,MODULESHOME=/opt/modules/3.1.6,LESSOPEN=|/usr/bin/lesspipe.sh > > > %s,G_BROKEN_FILENAMES=1,_=/opt/pbs/bin/qsub,PBS_O_QUEUE=workq' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'euser' value: 'tippensjl' (r: > > > NULL) > > > 12/16 14:32:42 MUserAdd(UName,UP) > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 INFO: PBS attribute 'egroup' value: 'tippensjl' (r: > > > NULL) > > > 12/16 14:32:42 MGroupAdd(GName,GP) > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MCPRestore(GROUP,tippensjl,Optr) > > > 12/16 14:32:42 INFO: no checkpoint entry for object 'GROUP > > > tippensjl ' > > > 12/16 14:32:42 INFO: group tippensjl added > > > 12/16 14:32:42 INFO: PBS attribute 'queue_rank' value: '41' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'queue_type' value: 'E' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'etime' value: '1134761206' (r: > > > NULL) > > > 12/16 14:32:42 MJobSetCreds(44,tippensjl,tippensjl,) > > > 12/16 14:32:42 MUserAdd(UName,UP) > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MGroupAdd(GName,GP) > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MJobGetAccount(44,A) > > > 12/16 14:32:42 MAMAccountGetDefault(tippensjl,AName,RIndex) > > > 12/16 14:32:42 MSSSDoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg) > > > 12/16 14:32:42 > > > MSysEMSubmit(EM,scheduler,comcom,scheduler,allocation-manager;) > > > 12/16 14:32:42 INFO: EM disabled > > > 12/16 14:32:42 MSUConnect(S,TRUE,EMsg) > > > 12/16 14:32:42 INFO: trying to connect to 192.168.79.231 (Port: 7112) > > > 12/16 14:32:42 INFO: successful connect to TCP server (sd: 10) > > > 12/16 14:32:42 MSUSendData(S,15000000,FALSE,FALSE) > > > 12/16 14:32:42 MSecGetChecksum(Buf,185,Checksum,HMAC64,CSKey) > > > 12/16 14:32:42 MSecHMACGetDigest(sss,3,<Body actor="root"><Request > > > action="Query" actor="root"><Object>User</Object><Where > > > name="Special">False</Where><Get name="Name"></Get><Get > > > name="DefaultProject"></Get></Request></Body>,185,CSString,20,DigestString,TRUE,TRUE) > > > 12/16 14:32:42 __MSecSHA1Init(context) > > > 12/16 14:32:42 __MSecSHA1Transform(context) > > > > > > And that's it - it just dies. I have the feeling that this is > > > something fairly easy that I didn't set up correctly... Just can't > > > seem to find what it is - I'm pretty new at this... Oh, yeah, I am > > > using torque 2.0.0p2 if that makes a difference. > > > > > > Thank you for any help you can give - I'm pulling my hair out. :-O :) > > > > > > -Jen > > > > > > Jennifer Tippens > > > Unix Admin, ORNL Institutional Cluster > > > Oak Ridge National Lab > > > _______________________________________________ > > > mauiusers mailing list > > > [email protected] > > > http://www.supercluster.org/mailman/listinfo/mauiusers > > > > > > --- > You are currently subscribed to gold-users as: [EMAIL PROTECTED] > To unsubscribe send a blank email to [EMAIL PROTECTED] > _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
