Hi, I wrote a document about how to integrate Torque/MAUI and Globus. It works for me, besides, to provide interesting hints to overcome problems that I and other people found.
Sure, it can be improved, then, your feedback is important to me. regards. PD: Gramatically need to be revised. Technicals suggestions are welcome. http://ece.uprm.edu/~s047267 http://del.icio.us/josanabr http://blog-grid.blogspot.comTitle: Preparing Torque/Maui for Work with GT4
Preparing Torque/Maui for Work with GT4John A. Sanabria - [EMAIL PROTECTED]Last Updated: Wed Sep 5 23:35:49 2007
This document is based on my previous tutorial Installing and Configuring Torque/MAUI. The previous tutorial works fine when you deal with a plain cluster, however, additional configuration steps are necessary to be done for integrate Torque/MAUI with GT4. Then, the reader of the previouis article, will found similarities between them. Likewise, the new reader do not require has any knowledge about the previous article to install a Torque/MAUI cluster that works with no Globus Toolkit (GT) integration. Finally, this tutorial is an on-going work, so, any feedback is welcome. RequirementsFirst of all, for deploy a minimal cluster you need one machine, however, for present problems related with independent machines, two computational nodes is suggested. My testbed consist of two Linux machines with FC7 installed. Furthermore, The machines have installed the following network services:
Besides, you must get torque and maui source code. UsersFor every machine belonging to the cluster, is necessary create one user with the same id. (Someone could provide a short NIS+ tutorial). For this tutorial the user created is josanabr. Setting Up the ServicesNow, I provide a short steps and hacks for configure RSH and NFS properly. RSHNext, I describe the steps to configure the RSH service. (For simplicity, I recommend execute this configuration steps at EVERY cluster machine. (cssh))
Now, you can try to login as [EMAIL PROTECTED] ~]$ rsh pdclab-04 connect to address 136.145.116.81 port 543: Connection refused Trying krb4 rlogin... connect to address 136.145.116.81 port 543: Connection refused trying normal rlogin (/usr/bin/rlogin) Password: Last login: Tue Sep 4 15:46:11 from pdclab-00 [EMAIL PROTECTED] ~]$
That's cool? For Torque purposes, not is, It does not like [EMAIL PROTECTED] ~]$ which -a rsh /usr/kerberos/bin/rsh /usr/bin/rsh
The first (You can try another more elegant solution) In order to do that, as root, execute the next commands: [EMAIL PROTECTED] etc]# cd /usr/kerberos/bin/ [EMAIL PROTECTED] bin]# mv rsh rsh.krb [EMAIL PROTECTED] bin]# mv rlogin rlogin.krb [EMAIL PROTECTED] bin]# ln -sf /usr/bin/rsh . [EMAIL PROTECTED] bin]# ln -sf /usr/bin/rlogin . Now, try again: [EMAIL PROTECTED] ~]$ rsh pdclab-04 Password: Last login: Tue Sep 4 15:46:29 from pdclab-00 [EMAIL PROTECTED] ~]$
Hmmm!, looks better :-), still, I need provide my password. To avoid type the password, e.g. when you try to login from [EMAIL PROTECTED] ~]$ vi .rhosts [EMAIL PROTECTED] ~]$ cat .rhosts pdclab-00.ece.uprm.edu pdclab-00 [EMAIL PROTECTED] ~]$ chmod og-r .rhosts [EMAIL PROTECTED] ~]$ ls -l .rhosts -rw------- 1 josanabr josanabr 33 Sep 4 16:16 .rhosts [EMAIL PROTECTED] ~]$ Now, try again: [EMAIL PROTECTED] ~]$ rsh pdclab-04 Last login: Tue Sep 4 16:18:17 from pdclab-00 [EMAIL PROTECTED] ~]$
Hmmm!!!, well done, guy. Now, allow the connection from NFS
Now, for proper integration of Torque and GT4 we need to share a filesystem from the master node with compute nodes.
Remember, our master node is [EMAIL PROTECTED] etc]# vi /etc/exports [EMAIL PROTECTED] etc]# cat /etc/exports /home pdclab-04.ece.uprm.edu(rw,sync) [EMAIL PROTECTED] etc]# Then, restart the NFS related services: /etc/init.d/portmap restart /etc/init.d/nfs restart /etc/init.d/nfslock restart
Now, you can [EMAIL PROTECTED] ~]# mount -t nfs pdclab-00:/home /home [EMAIL PROTECTED] ~]# ls -l /home/ total 8 drwx------ 5 globus globus 4096 Jul 23 17:56 globus drwx------ 4 josanabr josanabr 4096 Sep 5 14:19 josanabr Building the ClusterFor achieve the "PBS" and GT4 integration, the first thing to must describe is set up the Torque and MAUI components. Now, we select Torque as the resource manager for distributed environments and can be consider like a PBS clone, but open source. On the other hand, MAUI is a robust scheduler to support advanced mechanism and policies for schedul large set of distributed computational resources. No more words, hands on. Setting up TorqueTorque is an open PBS descendant. Then, it is a distributed resource manager, providing control over batch jobs and distributed compute nodes. Although, has support to handle scheduling policies, this is not a major concern. Let's put hands on. Download the softwareThe software can be downloaded from here. Note: this tutorial employ the version 2.1.9. Unpacking, Configuring, Compiling and Installing the ServerGo to a proper directory where you wish uncompress the file: [EMAIL PROTECTED] ~]# cd /usr/local/src/ [EMAIL PROTECTED] src]# tar xfz ~/torque-2.1.9.tar.gz [EMAIL PROTECTED] src]# cd torque-2.1.9/
Now, for configuring torque, explicitly, is requested to build the ./configure --enable-server --enable-monitor --enable-clients make make install With no errors, under torque source code directory, you need need to execute next commands: ./torque.setup globus make packages
the first command, end to configure the server and indicate that Setting Up a Compute NodeAccording to the tasks done at this moment, you can copy the scripts:
from the server [EMAIL PROTECTED] ~]# scp pdclab-00:/usr/local/src/torque-2.1.9/torque-package-clients-linux-i686.sh . [EMAIL PROTECTED]'s password: torque-package-clients-linux-i686.sh 100% 400KB 400.0KB/s 00:00 [EMAIL PROTECTED] ~]# scp pdclab-00:/usr/local/src/torque-2.1.9/torque-package-mom-linux-i686.sh . [EMAIL PROTECTED]'s password: torque-package-mom-linux-i686.sh 100% 448KB 447.5KB/s 00:00 [EMAIL PROTECTED] ~]# Now, install the packages: [EMAIL PROTECTED] ~]# ./torque-package-clients-linux-i686.sh --install Installing TORQUE archive... Done. [EMAIL PROTECTED] ~]# ./torque-package-mom-linux-i686.sh --install Installing TORQUE archive... Done. [EMAIL PROTECTED] ~]#
Verify that the [EMAIL PROTECTED] ~]# cat /var/spool/torque/server_name pdclab-00.ece.uprm.edu [EMAIL PROTECTED] ~]#
It's ok. For finalize the client configuration, edit the file arch x86 opsys fc6 $logevent 255 $usecp *:/home /mnt/home
Last line indicate to map the directory Now you can execute the program to receive the jobs from master node: [EMAIL PROTECTED] ~]# pbs_mom Setting Up MauiMaui is an advanced policy engine used to improve the manageability and efficiency of machines ranging from clusters of a few processors to multi-teraflop supercomputers.
Next steps must be executed at master node ( Download the softwareIn order to get the software go here. Previous hacks
Due to integration issues with Torque, Maui expects to find [EMAIL PROTECTED] ~]# cd /usr/local/lib [EMAIL PROTECTED] lib]# ln -sf libtorque.so libpbs.so [EMAIL PROTECTED] lib]# ln -sf libtorque.a libpbs.a Unpacking, Configuring, Compiling and InstallingExecute the next commands:
[EMAIL PROTECTED] lib]# cd /usr/local/src/
[EMAIL PROTECTED] src]# tar xfz ~/maui-3.2.6p13.tar.gz
[EMAIL PROTECTED] src]# cd maui-3.2.6p13/
[EMAIL PROTECTED] maui-3.2.6p13]# export MAUIDIR=/var/spool/maui
[EMAIL PROTECTED] maui-3.2.6p13]# ./configure --with-spooldir=${MAUIDIR}
[EMAIL PROTECTED] maui-3.2.6p13]# make
[EMAIL PROTECTED] maui-3.2.6p13]# make install
Note If you got some message error related with Final Configuration StepsOk, almost is done, so execute next: [EMAIL PROTECTED] ~]# qmgr Qmgr: set server resources_default.nodect = 1 Qmgr: set server resources_default.walltime = 00:05:00 Qmgr: quit Finally,
[EMAIL PROTECTED] ~]# qterm -t quick ; pbs_server
[EMAIL PROTECTED] ~]# /usr/local/maui/sbin/maui
[EMAIL PROTECTED] ~]# pbsnodes -a
pdclab-04.ece.uprm.edu
state = free
np = 1
ntype = cluster
status = arch=x86,opsys=fc6,uname=Linux pdclab-04.ece.uprm.edu 2.6.18-1.2798.fc6xen #1 SMP Mon Oct 16 15:11:19 EDT 2006 i686,sessions=? 0,nsessions=? 0,nusers=0,idletime=185,totmem=816556kb,availmem=729324kb,physmem=262324kb,ncpus=1,loadave=0.02,netload=26127667,state=free,jobs=? 0,rectime=1189026813
Good kid. Testing Torque/MAUI Installation
In order to test our cluster deployment, login as
#!/bin/bash
/bin/hostname
save it as, [EMAIL PROTECTED] ~]$ qsub mysub 0.pdclab-00.ece.uprm.edu [EMAIL PROTECTED] ~]$ ls -rtl total 8 -rw-r--r-- 1 josanabr josanabr 29 Sep 5 17:17 mysub -rw------- 1 josanabr josanabr 23 Sep 5 17:17 mysub.o0 -rw------- 1 josanabr josanabr 0 Sep 5 17:17 mysub.e0 [EMAIL PROTECTED] ~]$ cat mysub.o0 pdclab-04.ece.uprm.edu [EMAIL PROTECTED] ~]$ Already you have a cluster, congrats. :-D. Now, our journey begins, ;-) A Short Journey over GT Quick Start GuideHere, instead to provide a deep description of the steps for carry out the Globus configuration, compilation and installation process, we just provide a checklist to follow for achieve the integration between Torque/MAUI and GT4. Then, a prior knowlegde with GT installation is recommended. Configuring, Compiling and Installing GT4
Ok, remember [EMAIL PROTECTED] ~]$ cd gt4.0.5-all-source-installer/ [EMAIL PROTECTED] gt4.0.5-all-source-installer]$ ./configure --prefix=/opt/gt --enable-wsgram-pbs [EMAIL PROTECTED] gt4.0.5-all-source-installer]$ make ... ... echo "Your build completed successfully. Please run make install." Your build completed successfully. Please run make install. [EMAIL PROTECTED] gt4.0.5-all-source-installer]$ make install Setting up Security at your Cluster
For that step, I have a simpleCA set up in the
Besides, [EMAIL PROTECTED] ~]$ myproxy-init -s init Your identity: /O=Grid/OU=GlobusTest/OU=simpleCA-init.ece.uprm.edu/OU=ece.uprm.edu/CN=John Sanabria Enter GRID pass phrase for this identity: Creating proxy ................................... Done Proxy Verify OK Your proxy is valid until: Wed Sep 12 21:06:10 2007 Enter MyProxy pass phrase: Verifying - Enter MyProxy pass phrase: A proxy valid for 168 hours (7.0 days) for user josanabr now exists on init. [EMAIL PROTECTED] ~]$ myproxy-logon -s init Enter MyProxy pass phrase: A credential has been received for user josanabr in /tmp/x509up_u501. [EMAIL PROTECTED] ~]$ For more information read section 4.3 of GT QuickStart Guide. Preparing GridFTP ServiceYou can follow the steps described in section 5.4. Preparing the Globus Container
Previous to follow the instructions given in section 5.5, you need provide some information to configure the RFT service at
GRAM, the time of truthRead the section 5.7. Test it, as follow: [EMAIL PROTECTED] ~]$ globusrun-ws -submit -s -c /bin/date Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:99bef29a-5c26-11dc-b839-00163e3dc54e Termination time: 09/07/2007 03:09 GMT Current job state: Active Current job state: CleanUp-Hold Wed Sep 5 23:09:38 AST 2007 Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done. Well, but still you're not utilizing your cluster. Try this: [EMAIL PROTECTED] ~]$ globusrun-ws -Ft PBS -submit -S -f a.rsl Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:b504593c-5c26-11dc-8737-00163e3dc54e Termination time: 09/07/2007 03:10 GMT Current job state: StageIn Current job state: Pending Current job state: Active Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done. Works?, hmmm!!! I guess not: [EMAIL PROTECTED] ~]$ cat stderr pdclab-04.ece.uprm.edu: Connection refused /var/spool/torque/mom_priv/jobs/22.pdclab-0.SC: line 55: [: too many arguments
But, everything looks correct? More amazing is the way to resolve the problem. Edit the file [EMAIL PROTECTED] ~]$ globusrun-ws -Ft PBS -submit -S -f a.rsl Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:2483539e-5c27-11dc-9bc2-00163e3dc54e Termination time: 09/07/2007 03:13 GMT Current job state: StageIn Current job state: Pending Current job state: Active Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done. [EMAIL PROTECTED] ~]$ cat stderr [EMAIL PROTECTED] ~]$ cat stdout Hello World! Already done! Final CommentsAt this time, integrate a cluster with GT can be a hard task. Besides, there exist so many factors that can affect the normal integration process. Then, the mailing list support, sometimes, is either unavailable or poor effective. This is not your foul guys, anyway, is dissapointed. This document, fulfill my need, perhaps for a newbie reader, more details are necessary. The main motivation for generate this document, is provide a better roadmap to integrate Torque/MAUI with Globus. I am sure, this document can be improved, then, I need your feedback. Any correction (grammar, technical, whatever!), I'll appreciate it. Regards. ResourcesCertainly, i did not write all that from scratch, but, i use the next web resources:
For more information use google :-D, or write me at |
<<attachment: email.jpg>>
