Jen, We will try to reproduce this in-house.
Scott On Fri, 2005-12-16 at 15:35 -0500, Aquarijen wrote: > Hi Dave, > Ok. > Not 100% sure I did this right and not sure where you want a break, etc, but: > > [EMAIL PROTECTED] sbin]# gdb ./maui > GNU gdb Red Hat Linux (6.3.0.0-0.31rh) > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu"...Using host > libthread_db library "/lib64/tls/libthread_db.so.1". > > (gdb) r > Starting program: /opt/maui/sbin/maui > Detaching after fork from child process 13618. > > Program received signal SIGSEGV, Segmentation fault. > 0x00000000004eec0d in __MSecSHA1Transform (state=0x9429bfd8bc2ffff8, > buffer=0x4e7c2621cf7c26fd <Address 0x4e7c2621cf7c26fd out of bounds>) > at MSec.c:831 > 831 state[0] += a; > (gdb) > > > Thanks!! > Jen > > Unix Admin, ORNL Institutional Cluster > Oak Ridge National Lab > > > On 12/16/05, Dave Jackson <[EMAIL PROTECTED]> wrote: > > Jen, > > > > Can you run Maui under gdb. (See section 14.1.4 of the online docs) > > > > When the failure occurs, please issue 'where' and send us the output. > > We will also attempt to reproduce this locally. > > > > Dave > > > > On Fri, 2005-12-16 at 14:51 -0500, Aquarijen wrote: > > > Hi All, > > > > > > I am not sure if this is a gold question or a maui question - so I am > > > posting to both - I hope that is ok... > > > Sorry for so many questions lately! So, I made sure that no users on > > > the test cluster have usernames begining with a number. I have gold > > > running and I have accounts, projects, machines and users set up with > > > 100000000 deposited to each gold account. > > > If I configure maui to use gold as its AM, maui pretty much instantly > > > dies. I am using maui 3.2.6p13 and gold version 2.0.0.4. I cleared > > > out the checkpoint file. I shut everything down and cleared the > > > queue. I then started gold, then maui, then pbs_server and then the > > > pbs_moms. Maui dies. I've tried this in different orders, too. Maui > > > dies if I have the AMCFG line included. > > > > > > Here is my simple maui.cfg: > > > > > > # maui.cfg 3.2 > > > SERVERHOST b05l02 > > > ADMIN1 root tippensjl > > > RMCFG[base] TYPE=PBS > > > JOBAGGREGATIONTIME 00:00:10 > > > RMPOLLINTERVAL 00:00:30 > > > DOWNNODEDELAYTIME 72:00:00 > > > SERVERPORT 42559 > > > SERVERMODE NORMAL > > > LOGFILE maui.log > > > LOGFILEMAXSIZE 100000000 > > > LOGLEVEL 9 > > > QUEUETIMEWEIGHT[0] 10 > > > FSPOLICY DEDICATEDPS > > > FSDEPTH 7 > > > FSINTERVAL 24:00:00 > > > FSWEIGHT 1 > > > FSDECAY 0.80 > > > BACKFILLPOLICY ON > > > BACKFILLTYPE BESTFIT > > > RESERVATIONPOLICY CURRENTHIGHEST > > > NODEACCESSPOLICY SHARED > > > JOBMAXSTARTTIME 2:00:00 > > > JOBMAXOVERRUN 0:30:00 > > > AMCFG[bank] TYPE=GOLD HOST=b05l02 PORT=7112 SOCKETPROTOCOL=HTTP > > > WIRE-PROTOCOL=XML CHARGEPOLICY=DEBITALLWC JOBFAILUREACTION=NONE > > > FLUSHINTERVAL=12:00:00 TIMEOUT=15 > > > > > > And here is my maui-private.cfg: > > > CLIENTCFG[AM:bank] CSKEY=sss CSALGO=HMAC > > > > > > And here is the last little bit of my maui.log. I have loglevel turned > > > up to 9. > > > > > > 12/16 14:32:42 MUserAdd(UName,UP) > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MCPRestore(USER,tippensjl,Optr) > > > 12/16 14:32:42 INFO: no checkpoint entry for object 'USER > > > tippensjl ' > > > 12/16 14:32:42 INFO: user tippensjl added > > > 12/16 14:32:42 INFO: PBS attribute 'job_state' value: 'Q' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'queue' value: 'workq' (r: NULL) > > > 12/16 14:32:42 MReqSetAttr(44,RQ,ReqClass,Value,1,2) > > > 12/16 14:32:42 INFO: job flags for job 44: 0 > > > 12/16 14:32:42 MJobSetAttr(44,GAttr,Value,1,5) > > > 12/16 14:32:42 MUMAGetBM(JFeature,PREEMPTEE,3) > > > 12/16 14:32:42 INFO: attribute 'PREEMPTEE' cleared for job 44 > > > 12/16 14:32:42 MJobGetPAL(44,RPAL,PAL,NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'server' value: 'b05l02' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Checkpoint' value: 'u' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'ctime' value: '1134761206' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Error_Path' value: > > > 'b05l02:/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111/jen-b5.e44' > > > (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Hold_Types' value: 'n' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Join_Path' value: 'n' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Keep_Files' value: 'n' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Mail_Points' value: 'ae' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Mail_Users' value: > > > '[EMAIL PROTECTED]' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'mtime' value: '1134761206' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Output_Path' value: > > > 'b05l02:/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111/jen-b5.o44' > > > (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Priority' value: '0' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'qtime' value: '1134761206' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Rerunable' value: 'True' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: > > > '10000:00:00' (r: cput) > > > 12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: '1' (r: > > > ncpus) > > > 12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: > > > '30:ppn=2' (r: neednodes) > > > 12/16 14:32:42 __MPBSGetTaskList(44,30:ppn=2,NULL,0) > > > 12/16 14:32:42 MReqSetAttr(44,RQ,ReqNodeFeature,Value,1,2) > > > 12/16 14:32:42 INFO: 0 host task(s) located for job > > > 12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: '30' > > > (r: nodect)12/16 14:32:42 INFO: PBS attribute 'Resource_List' > > > value: '30:ppn=2' (r: nodes) > > > 12/16 14:32:42 INFO: processing node request line '30:ppn=2' > > > 12/16 14:32:42 __MPBSGetTaskList(44,30:ppn=2,NULL,0) > > > 12/16 14:32:42 MReqSetAttr(44,RQ,ReqNodeFeature,Value,1,2) > > > 12/16 14:32:42 INFO: 0 host task(s) located for job > > > 12/16 14:32:42 INFO: PBS attribute 'Resource_List' value: > > > '10000:00:00' (r: walltime) > > > 12/16 14:32:42 INFO: PBS attribute 'Shell_Path_List' value: > > > '/bin/bash' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'substate' value: '10' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'Variable_List' value: > > > 'PBS_O_HOME=/home/2vt,PBS_O_LANG=en_US.UTF-8,PBS_O_LOGNAME=tippensjl,PBS_O_PATH=/opt/intel/cce/9.0/bin:/opt/intel/fce/9.0/bin:/usr/kerberos/bin:/opt/mpich-ch_p4-icc-1.2.7/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/env-switcher/bin:/opt/kernel_picker/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/home/2vt/bin,PBS_O_MAIL=/var/spool/mail/tippensjl,PBS_O_SHELL=/bin/bash,PBS_O_HOST=b05l02,PBS_O_WORKDIR=/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111,MODULE_VERSION_STACK=3.1.6,MANPATH=/opt/intel/cce/9.0/man:/opt/intel/fce/9.0/man:/opt/mpich-ch_p4-icc-1.2.7/man:/opt/modules/default/man:/usr/share/man:/usr/man:/usr/local/share/man:/usr/local/man:/usr/X11R6/man:/opt/pbs/man:/opt/env-switcher/man:/opt/kernel_picker/man:/opt/pvm3/man,HOSTNAME=b05l02,PVM_RSH=ssh,_MODULESBEGINENV_=/home/2vt/.modulesbeginenv,SHELL=/bin/bash,TERM=xterm,HISTSIZE=1000,TMPDIR=/home/2vt/.tmpdir,MODULE_SHELL=sh,OLDPWD=/home/2vt,MODULE_OSCAR_USER=tippensjl,USER=tippensjl,LD_LIBRARY_PATH=/opt/intel/mkl72/lib/em64t:/opt/intel/cce/9.0/lib:/opt/intel/fce/9.0/lib,LS_COLORS=no=00:fi=00:di=00;34:ln=00;36:pi=40;33:so=00;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=00;32:*.cmd=00;32:*.exe=00;32:*.com=00;32:*.btm=00;32:*.bat=00;32:*.sh=00;32:*.csh=00;32:*.tar=00;31:*.tgz=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.bz=00;31:*.tz=00;31:*.rpm=00;31:*.cpio=00;31:*.jpg=00;35:*.gif=00;35:*.bmp=00;35:*.xbm=00;35:*.xpm=00;35:*.png=00;35:*.tif=00;35:,ENV=/home/2vt/.bashrc,OSCAR_HOME=/opt/oscar,PVM_ROOT=/opt/pvm3,PVM_ARCH=LINUX,MODULE_VERSION=3.1.6,MAIL=/var/spool/mail/tippensjl,PATH=/opt/intel/cce/9.0/bin:/opt/intel/fce/9.0/bin:/usr/kerberos/bin:/opt/mpich-ch_p4-icc-1.2.7/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/pbs/bin:/opt/pbs/lib/xpbs/bin:/opt/env-switcher/bin:/opt/kernel_picker/bin:/opt/pvm3/lib:/opt/pvm3/lib/LINUX:/opt/pvm3/bin/LINUX:/usr/local/apitest:/opt/c3-4/:/home/2vt/bin,INPUTRC=/etc/inputrc,PWD=/home/2vt/jenstests/schulth/Science/dms/GaN/Cube2x2x2_1Mn_NCC/sic_11111,_LMFILES_=/opt/modules/oscar-modulefiles/default-manpath/1.0.1:/opt/modules/oscar-modulefiles/torque/1.2.0p5:/opt/env-switcher/share/env-switcher/mpi/mpich-ch_p4-icc-1.2.7:/opt/modules/oscar-modulefiles/switcher/1.0.13:/opt/modules/oscar-modulefiles/kernel_picker/1.4.1.3:/opt/modules/oscar-modulefiles/pvm/3.4.5+4:/opt/modules/modulefiles/oscar-modules/1.0.5:/opt/modules/modulefiles/iforte/9.0:/opt/modules/modulefiles/icce/9.0:/opt/modules/modulefiles/mkl-em64t/7.2,LANG=en_US.UTF-8,MODULEPATH=/opt/env-switcher/share/env-switcher:/opt/modules/oscar-modulefiles:/opt/modules/version:/opt/modules/$MODULE_VERSION/modulefiles:/opt/modules/modulefiles:,LOADEDMODULES=default-manpath/1.0.1:torque/1.2.0p5:mpi/mpich-ch_p4-icc-1.2.7:switcher/1.0.13:kernel_picker/1.4.1.3:pvm/3.4.5+4:oscar-modules/1.0.5:iforte/9.0:icce/9.0:mkl-em64t/7.2,SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass,SHLVL=1,HOME=/home/2vt,LOGNAME=tippensjl,MODULESHOME=/opt/modules/3.1.6,LESSOPEN=|/usr/bin/lesspipe.sh > > > %s,G_BROKEN_FILENAMES=1,_=/opt/pbs/bin/qsub,PBS_O_QUEUE=workq' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'euser' value: 'tippensjl' (r: > > > NULL) > > > 12/16 14:32:42 MUserAdd(UName,UP) > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 INFO: PBS attribute 'egroup' value: 'tippensjl' (r: > > > NULL) > > > 12/16 14:32:42 MGroupAdd(GName,GP) > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MCPRestore(GROUP,tippensjl,Optr) > > > 12/16 14:32:42 INFO: no checkpoint entry for object 'GROUP > > > tippensjl ' > > > 12/16 14:32:42 INFO: group tippensjl added > > > 12/16 14:32:42 INFO: PBS attribute 'queue_rank' value: '41' (r: > > > NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'queue_type' value: 'E' (r: NULL) > > > 12/16 14:32:42 INFO: PBS attribute 'etime' value: '1134761206' (r: > > > NULL) > > > 12/16 14:32:42 MJobSetCreds(44,tippensjl,tippensjl,) > > > 12/16 14:32:42 MUserAdd(UName,UP) > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MGroupAdd(GName,GP) > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MUGetHash(tippensjl) > > > 12/16 14:32:42 INFO: hash 'tippensjl' --> 550228005 > > > 12/16 14:32:42 MJobGetAccount(44,A) > > > 12/16 14:32:42 MAMAccountGetDefault(tippensjl,AName,RIndex) > > > 12/16 14:32:42 MSSSDoCommand(allocation-manager,NULL,OBuf,ODE,SC,EMsg) > > > 12/16 14:32:42 > > > MSysEMSubmit(EM,scheduler,comcom,scheduler,allocation-manager;) > > > 12/16 14:32:42 INFO: EM disabled > > > 12/16 14:32:42 MSUConnect(S,TRUE,EMsg) > > > 12/16 14:32:42 INFO: trying to connect to 192.168.79.231 (Port: 7112) > > > 12/16 14:32:42 INFO: successful connect to TCP server (sd: 10) > > > 12/16 14:32:42 MSUSendData(S,15000000,FALSE,FALSE) > > > 12/16 14:32:42 MSecGetChecksum(Buf,185,Checksum,HMAC64,CSKey) > > > 12/16 14:32:42 MSecHMACGetDigest(sss,3,<Body actor="root"><Request > > > action="Query" actor="root"><Object>User</Object><Where > > > name="Special">False</Where><Get name="Name"></Get><Get > > > name="DefaultProject"></Get></Request></Body>,185,CSString,20,DigestString,TRUE,TRUE) > > > 12/16 14:32:42 __MSecSHA1Init(context) > > > 12/16 14:32:42 __MSecSHA1Transform(context) > > > > > > And that's it - it just dies. I have the feeling that this is > > > something fairly easy that I didn't set up correctly... Just can't > > > seem to find what it is - I'm pretty new at this... Oh, yeah, I am > > > using torque 2.0.0p2 if that makes a difference. > > > > > > Thank you for any help you can give - I'm pulling my hair out. :-O :) > > > > > > -Jen > > > > > > Jennifer Tippens > > > Unix Admin, ORNL Institutional Cluster > > > Oak Ridge National Lab > > > _______________________________________________ > > > mauiusers mailing list > > > [email protected] > > > http://www.supercluster.org/mailman/listinfo/mauiusers > > > > > > --- > You are currently subscribed to gold-users as: [EMAIL PROTECTED] > To unsubscribe send a blank email to [EMAIL PROTECTED] > _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
