Hello everyone, having dug through maui's sources, I think I've solved the problem.
In MResGetNRange(), maui will first create time ranges during which a given node is available. These are called ARanges. It will then perform various operations on these ranges (e.g. removing those that are too short, joining adjacent ones etc). If the node in question is either down, drained, or busy but expected to become idle, ARanges are adjusted so that they do not start less than NODEDOWNSTATEDELAYTIME in the future. The log I am currently investigating indeed shows that the node in question is busy/expected idle: 05/05 09:51:41 INFO: node node023 not considered for backfill (State: Busy/EState: Idle) According to the docs[1], NODEDOWNSTATEDELAYTIME has a default value of zero, so ARanges of such nodes should not be modified at all. However, a quick grep through the sources seem to indicate a default value of 3600 (seconds, I presume). This seems to fit the delay I've noticed: 05/05 09:51:40 INFO: node node023 supports 8 tasks of job 99468:0 for 23:04:08:22 at 1:00:00 After setting NODEDOWNSTATEDELAYTIME to 30 seconds via the changeparam command, the problematic high-priority job quickly started. This seems to indicate that the problem is resolved, but since the delays always have been somewhat stochastic (maybe some kind of race condition?), one cannot be 100% sure. Besides, the showconfig command does not mention NODEDOWNSTATEDELAYTIME, and changeparam tends to silently ignore illegal parameters; therefore, I cannot be totally sure I have changed any parameter that maui recognizes at all. A. [1] http://www.clusterresources.com/products/maui/docs/a.fparameters.shtml#nodedownstatedelaytime -- Ansgar Esztermann DV-Systemadministration Max-Planck-Institut für biophysikalische Chemie, Abteilung 105 _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
