Hi. For whatever it is worth, Maui has some serious bugs when it is in full use.
I had Maui running for a VERY long time and it would behave differently when it was mostly idle as when it was under heavy use - we have thousands of cores. In my frustration I downloaded and enabled "moab" eval and as if by magic all of the weirdness we were seeing in Maui went away over a 2-month period. After two months of use, when I reverted back to Maui, all of the same weirdness came back. We eventually dropped Maui and went with Son of Grid Engine as Moab was price prohibited for us. Grid Engine has been working very well albeit via several home grown custom modifications. Joseph On 12/09/2015 12:37 PM, Michel Béland wrote: > Bas van der Vlies wrote: > >> Dear Michel, >> >> What I read from the code (It is while back that I did that, but we are in >> the process of patching some maui stuff). The >> * N-TaskCount is number of jobs on the node >> * and N->DRes.Procs —> Which resources are given to the consumer, >> >> So you can have a node with 16 cores and if you have an share node: >> * 4 nodes dat consume all cores > You mean 4 jobs, I guess? > >> With the N->DRes.Procs you can determine if there are slot available for >> other jobs, eg: >> * 5 jobs each 2 core >> * N->DRes.Procs will be 10 >> * and still 6 slots available >> >> That is what I read. > Hmm... Looking at the code, I see for example that functions > MPBSNodeLoad() and MPBSNodeUpdate() in file MPBSI.c both loop on the > comma-separated tokens of the "jobs" attribute of a node by incrementing > N->TaskCount for each token. This is a while loop using > > ptr = MUStrTok(tmpBuffer,", \t",&TokPtr); > > to get the first token and > > ptr = MUStrTok(NULL,", \t",&TokPtr); > > to get the next one, pretty similar to the C function strtok() and the > POSIX function strtok_r(). > > This means that with a jobs attribute looking like this: > > 0/48.server, 1/48.server, 2/49.server, 3/49.server > > N->TaskCount will end up taking the value 4. It means that it counts the > number of processors, not the number of jobs on the node. > > Lines 3188 to 3209 of MPBSI.c are where MPBSNodeUpdate() treats one > token by incrementing N->TaskCount and extracting the JobID. That is > after that it gets interesting. N->DRes.Procs is incremented this way: > > 3211 if (MJobFind(JobID,&J,0) == SUCCESS) > 3212 { > 3213 if (J->Req[0]->DRes.Procs == -1) > 3214 { > 3215 tmpProcs = N->CRes.Procs; > 3216 } > 3217 else > 3218 { > 3219 tmpProcs = MAX(1,J->Req[0]->DRes.Procs); > 3220 } > 3221 > 3222 N->DRes.Procs = MIN(N->DRes.Procs + tmpProcs,N->CRes.Procs); > 3223 > > If I understand correctly, J->Req[0]->DRes.Procs) is the number of > processors dedicated to the job. But wait, what tells us that these > processors are on the current node? The only way I think this code might > work is if J->Req[0]->DRes.Procs == 0, making tmpProcs equal to 1. Then > N->DRes.Procs and N->TaskCount are the same... > > MPBSNodeLoad() has similar looking code, but nor exactly the same: > > 2514 if (MJobFind(JobID,&J,0) == SUCCESS) > 2515 { > 2516 N->DRes.Procs += MAX(1,J->Req[0]->DRes.Procs); /* FIXME */ > > I will experiment with a debugger on an old cluster running an earlier > version of Torque and see what value J->Req[0]->DRes.Procs has. > > > >>> On 8 dec. 2015, at 23:09, Michel Béland <michel.bel...@calculquebec.ca> >>> wrote: >>> >>> Hello, >>> >>> I am trying to modify Maui to understand correctly the "exec_host" job >>> attribute and the "jobs" node attribute. While reading the code, I >>> wondered what was the meaning of parts of the data structure. >>> >>> So what is the difference between N->TaskCount and N->DRes.Procs, where >>> N is a pointer of type mnode_t? The comments do not help a lot. >>> >>> -- >>> Michel Béland, analyste en calcul scientifique >>> michel.bel...@calculquebec.ca >>> bureau S-250, pavillon Roger-Gaudry (principal), Université de Montréal >>> téléphone : 514 343-6111 poste 3892 télécopieur : 514 343-2155 >>> Calcul Québec (www.calculquebec.ca) >>> Calcul Canada (calculcanada.ca) >>> >>> _______________________________________________ >>> mauiusers mailing list >>> mauiusers@supercluster.org >>> http://www.supercluster.org/mailman/listinfo/mauiusers >> -- >> Bas van der Vlies >> | Operations, Support & Development | SURFsara | Science Park 140 | 1098 XG >> Amsterdam >> | T +31 (0) 20 800 1300 | bas.vandervl...@surfsara.nl | www.surfsara.nl | >> >> >> >> > _______________________________________________ mauiusers mailing list mauiusers@supercluster.org http://www.supercluster.org/mailman/listinfo/mauiusers