>> Have you tried to recompile maui with larger limits?
>>
>> sed -i -e "/MAX_MRES/ s/1024/8192/g" include/moab.h
>> sed -i -e "/MMAX_JOB/ s/4096/8192/g" ./include/msched.h
>>
>> There might be others that need to be increased too.
>
> Ok, I specifically mentioned that the level varies daily and therefore is
> probably not related to those limits. It sometimes blocks at 3200 sometimes
> 4200, numbers chosen randomly. The maui is compiled by EMI and used in
> clusters with far more cores/running jobs so it's not a limit issue, it's a
> runtime state issue.
After increasing the log level the message that seems to cause this is:
01/10 17:19:31 ALERT: corruption found on iteration 0 in location
MJobGetSNRange-Start on node wn-v-3800.local
it then re-iterates and chooses another node and claims the same error. Looking
at the ALERT output:
01/10 17:37:02 ALERT: corruption found on iteration 1 in location
MResAdjustDRes-Start on node wn-v-4520.local
01/10 17:37:02 ALERT: R[023] 2097627 started but not ended
01/10 17:37:02 ALERT: R[023] 2097627 has no associated events
01/10 17:37:02 ALERT: corruption found on iteration 1 in location
MResAdjustDRes-End on node wn-v-4520.local
01/10 17:37:02 ALERT: R[023] 2097627 started but not ended
01/10 17:37:02 ALERT: R[023] 2097627 has no associated events
01/10 17:37:02 ALERT: corruption found on iteration 1 in location
MResAdjustDRes-Start on node wn-v-4664.local
01/10 17:37:02 ALERT: R[023] 2098811 started but not ended
01/10 17:37:02 ALERT: R[023] 2098811 has no associated events
01/10 17:37:02 ALERT: corruption found on iteration 1 in location
MResAdjustDRes-End on node wn-v-4664.local
01/10 17:37:02 ALERT: R[023] 2098811 started but not ended
01/10 17:37:02 ALERT: R[023] 2098811 has no associated events
01/10 17:37:02 ALERT: corruption found on iteration 1 in location
MResAdjustDRes-Start on node wn-v-4772.local
01/10 17:37:02 ALERT: R[023] 2097181 started but not ended
01/10 17:37:02 ALERT: R[023] 2097181 has no associated events
the list of nodes varies a lot so difficult to pinpoint.
Mario Kadastik, PhD
Researcher
---
"Physics is like sex, sure it may have practical reasons, but that's not why
we do it"
-- Richard P. Feynman
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers