Hullo,

I'm running maui 3.2.6p19-snap.1169758944 and I'm having trouble trying to get it to allow resource overruns for a short time.

Current settings:

whiteout:/var/spool/maui # /apps/maui/bin/showconfig -v | grep RESOURCELIMITPOLICY RESOURCELIMITPOLICY[0] PROC:EXTENDEDVIOLATION:CANCEL:00:15:00 MEM:ALWAYS:CANCEL
whiteout:/var/spool/maui #


However, looking at the logs today, I saw:

whiteout:/var/spool/maui/log # grep -i 'violation' maui.log
03/07 11:36:25 MSysRegEvent(JOBRESVIOLATION: job '3648' in state 'Running' has exceeded PROC resource limit (141 > 100) (action CANCEL will be taken) job start time: Wed Mar 7 11:35:32
03/07 11:36:25 ALERT:    limit violation action CANCEL succeeded

and

whiteout:/var/spool/maui/log # tracejob 3648

Job: 3648.whiteout.sf.utas.edu.au

03/07/2007 00:40:01  S    enqueuing into batch, state 1 hop 1
03/07/2007 00:40:01  S    Job Queued at request of
                          [EMAIL PROTECTED], owner =
[EMAIL PROTECTED], job name = Test2_4C,
                          queue = batch
03/07/2007 00:40:01  A    queue=batch
03/07/2007 11:35:32  S    Job Modified at request of
                          [EMAIL PROTECTED]
03/07/2007 11:35:32  S    Job Run at request of [EMAIL PROTECTED]
03/07/2007 11:35:33  M    Job Modified at request of
                          [EMAIL PROTECTED]
03/07/2007 11:35:33  S    Job Modified at request of
                          [EMAIL PROTECTED]
03/07/2007 11:35:33 A user=prachab group=users jobname=Test2_4C queue=batch ctime=1173188401 qtime=1173188401 etime=1173188401
                          start=1173227733 exec_host=whiteout
                          Resource_List.mem=2000mb Resource_List.ncpus=1
                          Resource_List.neednodes=whiteout
                          Resource_List.nodect=1
                          Resource_List.walltime=1000:00:00
03/07/2007 11:36:25 S Job deleted at request of [EMAIL PROTECTED]/07/2007 11:36:25 S Job sent signal SIGTERM on delete
03/07/2007 11:36:25  M    kill_task: killing pid 32547 task 1 with sig 15
03/07/2007 11:36:25  M    kill_task: killing pid 32569 task 1 with sig 15
03/07/2007 11:36:25  M    kill_task: killing pid 32574 task 1 with sig 15
03/07/2007 11:36:25  M    kill_task: killing pid 32615 task 1 with sig 15
03/07/2007 11:36:25  A    [EMAIL PROTECTED]
03/07/2007 11:36:28  S    Exit_status=143 resources_used.cput=00:00:46
                          resources_used.mem=300784kb
                          resources_used.vmem=341792kb
                          resources_used.walltime=00:00:52
03/07/2007 11:36:28  M    kill_task: killing pid 32615 task 1 with sig 9
03/07/2007 11:36:28 M scan_for_terminated: job 3648.whiteout.sf.utas.edu.au
                          task 1 terminated, sid 32547
03/07/2007 11:36:28  M    job was terminated
03/07/2007 11:36:28 A user=prachab group=users jobname=Test2_4C queue=batch ctime=1173188401 qtime=1173188401 etime=1173188401
                          start=1173227733 exec_host=whiteout
                          Resource_List.mem=2000mb Resource_List.ncpus=1
Resource_List.neednodes=batch Resource_List.nodect=1
                          Resource_List.walltime=1000:00:00 session=32547
                          end=1173227788 Exit_status=143
                          resources_used.cput=00:00:46
                          resources_used.mem=300784kb
                          resources_used.vmem=341792kb
                          resources_used.walltime=00:00:52
03/07/2007 11:36:37  S    dequeuing from batch, state COMPLETE


It looks like Maui didn't wait a full 15 minutes before killing the job. Is there something wrong with my config?

- Nick

--
Nick Sonneveld  |  [EMAIL PROTECTED]
IT Resources, University of Tasmania, Private Bag 69, Hobart Tas 7001
(03) 6226 6377  |  0407 336 309  |  Fax (03) 6226 7171
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to