I've a very simple setup, a Beowulf Cluster with 3 nodes; server client1 and client2. mpich user is mounted on client 1 and 2 using NFS and MPICH2 is installed in mpich's home dir Torque-2.4.3 is installed on this cluster with following configuration for server ./configure --prefix=/opt/pbs --enable-mom --enable-server --enable-client --with-default-server=server and for client ./configure --prefix=/opt/pbs --enable-mom --enable-client --with-default-server=server after installing I've installed packeges server mom and client --> server mom and client --> client1 and 2 since my server is also a compute node so I've installed mom package on server. and my default queue is qmgr Qmgr: create queue batch Qmgr: set server operators = r...@server Qmgr: set queue batch queue_type = Execution Qmgr: set queue batch started = True Qmgr: set queue batch enabled = True Qmgr: set server default_queue = batch Qmgr: set server scheduling = True
now the problem is job's with resource requirements can't run, if I type a script #!/bin/sh #PBS -N testJob #PBS -l nodes=2:ppn=2 #PBS -l walltime=00:02:00 sleep 100 /home/mpich/mpich2-install/bin/mpirun -n 10 mpich2-1.0.8/examples/cpi hostname would not run but but if I ommit the line #PBS -l nodes=2:ppn=2 it would run. Why is it that I can't submit resource requirements? and following would run perfectly #!/bin/sh #PBS -N testJob #PBS -l walltime=00:02:00 sleep 100 /home/mpich/mpich2-install/bin/mpirun -n 10 mpich2-1.0.8/examples/cpi hostname [mp...@server ~]$ qsub jobScript.sh (I submitted a script with resource requirement) 14.server but there was no output in home dir and following are the log's generated pbs_mom 01/13/2010 15:07:49;0008; pbs_mom;Job;14.server;JOIN JOB as node 1 01/13/2010 15:07:59;0001; pbs_mom;Svr;pbs_mom;LOG_DEBUG::delete_blcr_checkpoint_files, No checkpoint directory specified for 14.server pbs_server 01/13/2010 15:07:49;0100;PBS_Server;Job;14.server;enqueuing into batch, state 1 hop 1 01/13/2010 15:07:49;0008;PBS_Server;Job;14.server;Job Queued at request of mp...@server, owner = mp...@server, job name = testJob, queue = batch 01/13/2010 15:07:49;0040;PBS_Server;Svr;server;Scheduler was sent the command new 01/13/2010 15:07:49;0008;PBS_Server;Job;14.server;Job Modified at request of schedu...@server 01/13/2010 15:07:49;0008;PBS_Server;Job;14.server;Job Run at request of schedu...@server 01/13/2010 15:07:49;0040;PBS_Server;Svr;server;Scheduler was sent the command recyc 01/13/2010 15:08:00;0010;PBS_Server;Job;14.server;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=380kb resources_used.vmem=2428kb resources_us ed.walltime=00:00:12 01/13/2010 15:08:09;000d;PBS_Server;Job;14.server;Post job file processing error; job 14.server on host client1/1+client1/0+server/1+server/0 01/13/2010 15:08:09;0100;PBS_Server;Job;14.server;dequeuing from batch, state COMPLETE 01/13/2010 15:08:09;0040;PBS_Server;Svr;server;Scheduler was sent the command term pbs_sched 01/13/2010 15:07:49;0040; pbs_sched;Job;14.server;Job Run PLE
