On Sun, Dec 21, 2008 at 08:32:08PM -0500, John Kitchin alleged:
> Hi everyone,
> 
> I am in the process of replacing PBSPro on our cluster with Torque/Maui. I
> have installed the latest versions of Torque and Maui, and Torque appears to
> run fine on its own and runs jobs. The installations seem to have gone well
> according to the directions and tests. I have not been able to get maui to
> schedule jobs though (after stopping pbs_sched and starting maui as user
> jtest), they just remain in the queue in a deferred state.
> 
> our basic setup is a login/submit node where pbs_server and maui run called
> beowulf (beowulf.cheme.cmu.edu is the full name), with the execute nodes on
> an internal network.
> 
> Typical output of checkjob on a deferred job is:
> 
> job is deferred.  Reason:  RMFailure  (job cannot be started - cannot set
> hostlist)
> Holds:    Defer  (hold reason:  RMFailure)
> PE:  1.00  StartPriority:  2
> cannot select job 52 for partition DEFAULT (job hold active)
> 
> the torque log indicates an error connecting to MOM:
> 12/21/2008 18:04:32;0008;PBS_Server;Job;52.beowulf;Job Modified at request
> of jt...@beowulf
> 12/21/2008 18:04:32;0001;PBS_Server;Req;;Server could not connect to MOM
> 12/21/2008 18:04:32;0080;PBS_Server;Req;req_reject;Reject reply
> code=15070(Server could not connect to MOM), aux=0, type=ModifyJob, from
> jt...@beowulf
> 12/21/2008 18:05:16;0002;PBS_Server;Svr;PBS_Server;Torque Server Version =
> 2.4.0b1, loglevel = 0

This means that something is wrong between pbs_server and pbs_mom.  I don't
think this has anything to do with maui.

Test with 'qrun'.  That is a torque command that will attempt to start the job. 
 If that also fails, then you really know it isn't maui.

Also, you are running trunk.  You should really start with the latest 2.1.x or
2.3.6 (releasing soon).


> on the nodes, the mom config files contain
> matsim (jtest) ~ > ssh c1n10 'cat /var/spool/torque/mom_priv/config'
> $clienthost beowulf
> $restricted *.cheme.cmu.edu

$clienthost is ancient.  You want to use $pbsserver.

And why use $restricted?  That disables security.

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

See the Dishonor Roll at http://www.californiansagainsthate.com/

Attachment: pgprEJun08BW2.pgp
Description: PGP signature

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to