Hi Josh, On Wed, Jul 05, 2006 at 11:24:00AM -0600, Josh Butikofer wrote: >We are looking at having this fix in by July 19th. Would you be willing to >test out the fix when it is in place?
sure! thanks for working on it :-) but I'll likely be at conferences/workshops from about 18th to the 28th so won't be able to do too much until after that. BTW, current maui/torque seems to be able to suspend all threads of parallel LAM jobs as well as just serial jobs, which is terrific. haven't tried mpich. cheers, robin > >-- >Joshua Butikofer >Cluster Resources, Inc. > >[EMAIL PROTECTED] >Voice: (801) 717-3707 >Fax: (801) 717-3738 >-------------------------- > > >Robin Humble wrote: >>Hi, >> >>On Thu, Apr 27, 2006 at 10:32:40AM -0600, Josh Butikofer wrote: >>>We've confirmed that this behavior is happening in Maui. Moab Workload >>>Manager currently has the desired behavior with suspended jobs accruing >>>priority (and also correctly handles different classes involved). We >>>hope that over the next few weeks we will be able to make these >>>improvements in Maui as well. We will keep the list posted on our >>>progress. >> >>any updates? >> >>in case you were looking for a simpler test case, the below 2 queue >>system seems to have the same behaviour as the previous bug report - >>ie. the suspended PREEMPTEE job has a hard time resuming. >> >>in other words after a PREEMPTOR job steams through (correctly) we end >>up with a previously queued PREEMPTEE job then being chosen to run over >>the top of the suspended PREEMPTEE job. >> >>I don't think this is correct behaviour as only PREEMPTOR jobs should >>be able to run over the top of PREEMPTEE jobs. >> >>versions are: >>torque 2.1.1-3 (rebuild on AS4 i686 from the fc5 .src.rpm), maui 3.2.6p16 >> >>relevant part of maui.cfg: >> >>PREEMPTPOLICY SUSPEND >>CLASSCFG[debug] QDEF=high >>CLASSCFG[workq] QDEF=low >>QOSCFG[high] PRIORITY=500 QFLAGS=PREEMPTOR >>QOSCFG[low] PRIORITY=100 QFLAGS=PREEMPTEE >>QOSWEIGHT 1 >> >>cheers, >>robin >> >>>-- >>>Joshua Butikofer >>>Cluster Resources, Inc. >>> >>>[EMAIL PROTECTED] >>>(801) 798-7488 >>>-------------------------- >>> >>> >>>David Corredor wrote: >>>>The problem is not just that the suspended job gets once again preempted >>>>by a job of its same class from the IDLE queue, this happens regardless >>>>of the class of the new job. >>>> >>>> Ex. 3 queues (1 verylong, 1 long, 1 fast. Fast preempts long and >>>>verylong, and long preempts verylong, verylong should not preempt). >>>> - Submit 1 long job so that it takes all resources in cluster. >>>> - Submit a verylong job so that it waits in the IDLE queue. >>>> - Submit a fast job. >>>> >>>> The fast job preempts the long one, and once it finishes, instead of the >>>>long one to resume execution, the verylong kicks in and preempts it once >>>>again (and it shouldn't). >>>> >>>> >>>> >>>> >>>> >>>><quote who="Ronny T. Lampert"> >>>> >>>>>..... >>>>>However I experience the very same problem as you do (I need the >>>>>QUEUETIMEWEIGHT set to 1) - the preempted ones stay suspended and >>>>>instead >>>>>a >>>>>NEW job from the batch queue is started :-( >>>>> >>>>>I think this is a bug: suspended jobs *should age*, too. >>>>>Or automatically get a slightly higher priority than the highest in the >>>>>same >>>>>class to prevent it from staying suspended and interrupted by jobs from >>>>>the >>>>>same class. >>>>> >>>>>Could some developer shortly comment on that issue? >>>>> >>>>>Thanks! >>>>>Ronny >>>>> >>>>> >>>> >>>> >>>>_______________________________________________ >>>>mauiusers mailing list >>>>[email protected] >>>>http://www.supercluster.org/mailman/listinfo/mauiusers >>>_______________________________________________ >>>mauiusers mailing list >>>[email protected] >>>http://www.supercluster.org/mailman/listinfo/mauiusers >>_______________________________________________ >>mauiusers mailing list >>[email protected] >>http://www.supercluster.org/mailman/listinfo/mauiusers _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
