Hi, I have torque installed on my cluster (40 nodes amd opteron) and is running fine. My advice is to change names of the nodes as you indicate. That is, beginnin with a letter as a general practice as Reuti says. My experience with torque is that you can't submit a job with the "name" parameter (qsub -N) beginning with a number, so i expect a similar behaviour with node names.
Hope it helps Marc ------------------------------------------------------ Marc Noguera i Julian, PhD System Manager / Researcher Despatx C7-149. Edifici Cn. Campus UAB. Bellaterra 08193. Barcelona email: marc_at_klingon.uab.es web: http://klingon.uab.es/marc Tlf/Phone: 00 34 935812173 ------------------------------------------------------- > > Message: 2 > Date: Sun, 13 Apr 2008 19:53:53 +0200 > From: Reuti <[EMAIL PROTECTED]> > Subject: Re: [Beowulf] TORQUE issues > To: "Lance S. Jacobsen" <[EMAIL PROTECTED]> > Cc: [email protected] > Message-ID: > <[EMAIL PROTECTED]> > Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed > > Hi, > > Am 13.04.2008 um 04:52 schrieb Lance S. Jacobsen: > > I recently put together a small cluster of Xeons using CentOS 5.1 > > x86_64. This cluster is my first real big experience with Linux > > and administration. It took some learning and such to install NIS, > > NFS, etc., but now the machines seem to be working well, and so I > > am working on the next step: installing a que scheduler. I decided > > on TORQUE 2.3.0 since its free and I don't know any better. I have > > installed this and am having trouble getting it to detect my nodes. > > > > I think the problem is that I named them starting with numbers in > > my /etc/hosts file: 1of12 , 2of12, ... 12of12. Instead of something > > like node01, node02, ... > > > > After the installation, TORQUE did not create a file called 'nodes' > > which it told me that I needed, and so after searching the web I > > found the command to create it: > > > > # qmgr -c "create node 2of12" > > > > When I do this it gives me the following reply: > > > > qmgr: syntax error - checklist failed > > create node 2of12 > > /\ > > > > If I do this naming my node with a letter in front (n2of12) then it > > seems to work and generate the nodes file. > > > > Now if I then go and do the "pbsnodes -a" command it tells me: > > > > n2of12 > > > > state = down > > np =1 > > ntype = cluster > > > > seems fine... should be down since there is no n2of12 in my hosts > > file. > > > > Now if I then go and rename the node in the node file back to 2of12 > > and type the following to kill and restart the server: > > > > # qterm > > # pbs_server > > > > I get the following reply: > > > > PBS_Server: pbsd_init(setup_nodes), token "2of12" doesn't start > > with alpha on line 1. > > > > PBS_Server: PBS_Server, pbsd_init failed > > > > Now I am reluctant to go and change all of my node names (IP > > aliases) since everything else about my cluster is finally working > > well and so I have been trying to find out why pbsd_init will not > > accept host names that start with numbers. Also, I would hate to go > > and change this if it is not the problem. > > > > Does anyone know if I might be able to edit the setup files > > associated with pbsd_init to get this to work (or any other ways to > > do this)? > > I wouldn't use in general a digit as first charcter, like it's > outlined here: > > http://rfc.net/rfc1178.html page 4. > > Some programs might simply check the first character to decide > whether it's a hostname or TCP/IP address. Thinking in long terms > and additional software in your cluster (maybe even parallel apps), > I would suggest to change the names of the machines. > > -- Reuti > > BTW: Torque has a list on its own at: http://www.clusterresources.com > > > Thanks, > > > > Lance > > > > -- > > Lance S. Jacobsen, Ph.D. > > President > > GoHypersonic Incorporated > > 714 E. Monument Ave., Suite 201 > > Dayton, OH 45402-1382 > > Tel: 937-531-6678 > > Fax: 937-531-6679 > > _______________________________________________ > > Beowulf mailing list, [email protected] > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > ------------------------------ > > _______________________________________________ > Beowulf mailing list > [email protected] > http://www.beowulf.org/mailman/listinfo/beowulf > > End of Beowulf Digest, Vol 50, Issue 24 > *************************************** _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
