Hi, We got lot of errors with maui version 3.2.6p1 (segfault mainly). Since the update to version 3.3.4, it works fine.
Best regards, Jerome Pansanel On lun., 2012-03-05 at 14:06 +0530, Jayavant Patil wrote: > >Hi, > > >We have Torque Server Version 2.5.8 and maui version 3.2.6p1 > installed on > >rhel 5.2 server. "showstart" for one of the jobs says that job should > start > >now i.e. > > >Earliest start in 00:00:00 on current time. > >######################## > >checkjob -vv says that > > >checkjob -vv 62235 > >checking job 62235 (RM job '62235.yc9.cn.yuva.param') > >State: Idle > >Creds: user:abcd group:pqr account:PQR-PR class:q1 qos:q1-qos > >WallTime: 00:00:00 of 2:05:00:00 > >SubmitTime: Thu Feb 23 18:56:26 > >(Time Queued Total: 1:21:27:05 Eligible: 1:21:27:05) > > >Total Tasks: 2 > > >Req[0] TaskCount: 2 Partition: ALL > >Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 > >Opsys: [NONE] Arch: [NONE] Features: [NONE] > >Exec: '' ExecSize: 0 ImageSize: 0 > >Dedicated Resources Per Task: PROCS: 1 > >NodeAccess: SHARED > >NodeCount: 0 > >IWD: [NONE] Executable: [NONE] > >Bypass: 51 StartCount: 0 > >PartitionMask: [ALL] > >Reservation '62235' (00:00:00 -> 2:05:00:00 Duration: 2:05:00:00) > >PE: 2.00 StartPriority: 2727 > >job cannot run in partition DEFAULT (insufficient idle procs > available: 0 < > >2) > >job can run in partition P1 (32 procs available. 2 procs required) > >job can run in partition P2 (48 procs available. 2 procs required) > >######################## > >showres -n 62235 says that > > >reservations on Sat Feb 25 16:28:10 > > > NodeName Type ReservationID JobState Task > Start Duration StartTime > > > node16.clusternode Job 62235 Idle 2 > 00:00:00 2:05:00:00 Sat Feb 25 16:28:10 > >1 nodes reserved > ############################ > >checknode node16.clusternode says that node is available for job run. > > >but somehow job is not going and is not giving any error in maui, > pbs_server,pbs_mom logs also. > > >What can be the issue? > > Have you seen that Maui is starting the job in maui.log? If yes, then > there might be the communication problem with TORQUE. > > >What can be done to make job run and avoid the same in future? > > How many partitions you have in you cluster? > > Can you try to submit the job by specifying the PARTITION as follows: > > qsub -q <queue_name> -l nodes=<requirement> -W x=PARTITION:<partition > name> > > >thank you > > >-pankakjd > > -- > > Thanks & Regards, > Jayavant Ningoji Patil > +91 9923536030. > > _______________________________________________ > mauiusers mailing list > [email protected] > http://www.supercluster.org/mailman/listinfo/mauiusers -- Jerome Pansanel IPHC 23 rue du Loess, BP 28 F-67037 STRASBOURG Cedex 2 T. +33 (0)3 88 10 66 24 P. +33 (0)6 25 19 24 43 F. +33 (0)3 88 10 62 34 _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
