hi all,

i've build maui on a centos 5.1 x86_64 machine with torque 2.3.1 (also x86_64).

job schedluing with pbs_sched works (so i'm going to guess that torque is not to blam here).

symptom:
submitted jobs stay queued, showq/checkjob commands fail.

(using LOGLEVEL 9):
in /var/log/maui.log:

07/10 16:35:05 INFO:     no PBS sched socket connections ready
07/10 16:35:05 MSUAcceptClient(5,ClientSD,HostName,TCP)
07/10 16:35:05 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
07/10 16:35:05 INFO:     all clients connected.  servicing requests

strace -p <pid_of_maui>:

accept(5, 0x7fff08213660, [34359738384]) = -1 EAGAIN (Resource temporarily unavailable)
write(3, "07/10 16:36:02 INFO:     accept "..., 90) = 90
write(3, "07/10 16:36:02 INFO:     all cli"..., 68) = 68
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
write(3, "07/10 16:36:02 MRMCheckEvents()\n", 32) = 32
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)


i found this thread with similar issues:
http://www.clusterresources.com/pipermail/mauiusers/2008-January/003085.html

showq
ERROR:    lost connection to server
ERROR:    cannot request service (status)

except that strace on eg showq does not give the 'EACCES (Permission denied)', but 'connect(3, {sa_family=AF_INET, sin_port=htons(40559), sin_addr=inet_addr("192.168.10.1")}, 16) = -1 EINPROGRESS (Operation now in progress)'

i also ran strace on maui -d, no suspicious things until the
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fffde67ca70, [16]) = -1 EAGAIN (Resource temporarily unavailable)


i have disabled ipv6 (and iptables) and selinux.

the maui rpms i've build a rebuild of a working setup for sl4 x86_64 (but with some older snapshot of torque 230).
exact release is:
3.2.6p20 snap.1212617145

configure options from config.log
$ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --target=x86_64-redhat-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --with-key=123456 --with-spooldir=/var/spool/maui


all hints welcome.

many thanks,

stijn
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to