Dear Torque/Maui Users,
I have been running Torque only for a few weeks, so not all parts work
well.
All jobs seem to think they are run by [EMAIL PROTECTED] .
"installer" is a loginid that submits a job, but it is submitted to a
queue on the master node called "silvio" and I do not use localdomain at
all on the beowulf cluster...
The issue is how to define localhost.localdomain in torque queues?
hostname returns silvio, dnsdomainname returns nothing as it should.
Jobs do run but they only run on the FIRST node in the nodes lists (only)
- admittedly only one or two jobs at time and that node can run at least
4..
If I setup maui, it fails immediately since localhost.localdomain is not
an authorized node.. I'd like to move up to maui but I have to fix the
localdomain issue first.
pbsnodes -a
node07
state = free
np = 4
properties = d1950
ntype = cluster
jobs = 0/52.localhost.localdomain <-------????
localhost.localdomain : should be " jobs = 0/52.silvio " ??
status = opsys=linux,uname=Linux node07 2.6.9-42.ELsmp #1...
...
silvio
state = free
np = 2
ntype = time-shared
status = opsys=linux,uname=Linux silvio 2.6.9-42.0.3.ELsmp #1..
With the master node seeming to know that its name is "silvio" - and in
the beowulf cluster there is no DNS domain definition. Good ol' fashioned
/etc/hosts names
#
127.0.0.1 localhost.localdomain localhost
192.168.5.11 kickstart
192.168.5.99 silvio silvio.sh.rohmhaas.com
192.168.5.107 node07 node7
192.168.5.108 node08 node8
...
NB - the silvio.sh.rohmhaas.com is for the other ethernet card to allow
remote access to the cluster master.
------
Sincerely,
Tom Pierce
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers