BTW, I just tried upgrading to maui-3.3.1 and I still have the same issue. Maui segfaults when I try to start the maui process with this one job in the queue. -- Steven DuChene
From: DuChene, StevenX A Sent: Monday, November 28, 2011 4:05 PM To: [email protected] Subject: maui segfaults trying to schedule a job This morning I discovered that the maui scheduler process was not running on one of our clusters like it should. When I try to start the maui process as the maui user I get a segmentation fault. In checking the log files the last few entries look like this: 11/28 15:45:24 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg) 11/28 15:45:24 INFO: job '231' Priority: 605 11/28 15:45:24 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 605(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0) 11/28 15:45:24 MStatClearUsage([NONE],Active) 11/28 15:45:24 INFO: total jobs selected (ALL): 1/1 11/28 15:45:24 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg) 11/28 15:45:24 INFO: job '231' Priority: 605 11/28 15:45:24 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 605(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0) 11/28 15:45:24 MStatClearUsage([NONE],Idle) 11/28 15:45:24 INFO: total jobs selected (ALL): 1/1 11/28 15:45:24 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE) 11/28 15:45:24 INFO: total jobs selected in partition ALL: 1/1 11/28 15:45:24 MQueueScheduleRJobs(Q) 11/28 15:45:24 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE) 11/28 15:45:24 INFO: total jobs selected in partition ALL: 1/1 11/28 15:45:24 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE) 11/28 15:45:24 INFO: total jobs selected in partition DEFAULT: 1/1 11/28 15:45:24 MQueueScheduleIJobs(Q,DEFAULT) 11/28 15:45:24 INFO: 156 feasible tasks found for job 231:0 in partition DEFAULT (39 Needed) 11/28 15:45:24 INFO: 156 feasible tasks found for job 231:1 in partition DEFAULT (39 Needed) 11/28 15:45:24 INFO: 156 feasible tasks found for job 231:2 in partition DEFAULT (39 Needed) 11/28 15:45:24 INFO: 156 feasible tasks found for job 231:3 in partition DEFAULT (39 Needed) 11/28 15:45:24 INFO: 156 feasible tasks found for job 231:4 in partition DEFAULT (16 Needed) Prior to the above entries there are a WHOLE BUNCH of entries similar to these shown below: 11/28 15:45:24 MUGetIndex(TJC,ValList,0) 11/28 15:45:24 MUGetIndex(TNJA,ValList,0) 11/28 15:45:24 MUGetIndex(TNJC,ValList,0) 11/28 15:45:24 MUGetIndex(TNXF,ValList,0) 11/28 15:45:24 MUGetIndex(TPSD,ValList,0) 11/28 15:45:24 MUGetIndex(TPSE,ValList,0) 11/28 15:45:24 MUGetIndex(TPSR,ValList,0) 11/28 15:45:24 MUGetIndex(TPSU,ValList,0) 11/28 15:45:24 MUGetIndex(TQM,ValList,0) 11/28 15:45:24 MUGetIndex(TQT,ValList,0) 11/28 15:45:24 MUGetIndex(TRT,ValList,0) 11/28 15:45:24 MUGetIndex(TXF,ValList,0) There is only this one job in the queue on a 256 node cluster running torque 2.5.7 and maui 3.2.6p21 I have tried starting the maui process within strace but I do not see any smoking gun in that strace output. I can probably get maui to start if I qdel the job but I was sort of hoping to see what was causing the problem in case any additional debugging output was needed. -- Steven DuChene _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
