Hi everyone, I am in the process of replacing PBSPro on our cluster with Torque/Maui. I have installed the latest versions of Torque and Maui, and Torque appears to run fine on its own and runs jobs. The installations seem to have gone well according to the directions and tests. I have not been able to get maui to schedule jobs though (after stopping pbs_sched and starting maui as user jtest), they just remain in the queue in a deferred state.
our basic setup is a login/submit node where pbs_server and maui run called beowulf (beowulf.cheme.cmu.edu is the full name), with the execute nodes on an internal network. Typical output of checkjob on a deferred job is: job is deferred. Reason: RMFailure (job cannot be started - cannot set hostlist) Holds: Defer (hold reason: RMFailure) PE: 1.00 StartPriority: 2 cannot select job 52 for partition DEFAULT (job hold active) the torque log indicates an error connecting to MOM: 12/21/2008 18:04:32;0008;PBS_Server;Job;52.beowulf;Job Modified at request of jt...@beowulf 12/21/2008 18:04:32;0001;PBS_Server;Req;;Server could not connect to MOM 12/21/2008 18:04:32;0080;PBS_Server;Req;req_reject;Reject reply code=15070(Server could not connect to MOM), aux=0, type=ModifyJob, from jt...@beowulf 12/21/2008 18:05:16;0002;PBS_Server;Svr;PBS_Server;Torque Server Version = 2.4.0b1, loglevel = 0 maui is running as the user jtest, and jtest is a manager and operator in torque and as admin1 in maui some output from qmgr -c 'p s' set server scheduling = True set server acl_hosts = beowulf set server managers = jt...@beowulf set server operators = jt...@beowulf set server default_queue = q_feed set server log_events = 255 set server mail_from = ChemE-beowulf-PBS set server query_other_jobs = True set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 set server comment = ChemE Beowulf Cluster set server next_job_number = 53 top of maui.cfg # maui.cfg 3.2.6p20 SERVERHOST beowulf # primary admin must be first in list ADMIN1 jtest # Resource Manager Definition RMCFG[BEOWULF] TYPE=PBS on the nodes, the mom config files contain matsim (jtest) ~ > ssh c1n10 'cat /var/spool/torque/mom_priv/config' $clienthost beowulf $restricted *.cheme.cmu.edu Does anything stand out as wrong here? I have tried several variations of settings of parameters above with no luck at getting maui to work. any suggestions? thanks, j ----------------------------------- John Kitchin Assistant Professor NETL-IAES Resident Institute Fellow Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 http://kitchingroup.cheme.cmu.edu
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
