hi all,

i had some time to look a bit further into it.

the good news is that the scheduling works (and that i know that i can ignore the 'Resource temporarily unavailable' messages).

the bad news is that the showq (or any other maui command still fails).

strace of showq gives
...
connect(3, {sa_family=AF_INET, sin_port=htons(40559), sin_addr=inet_addr("192.16
8.10.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
...
sendto(3, "00000057\nCK=4fa43eb400e5e9d7 DT=CMD=showq AUTH=root ARG=0 ALL 0 \n"
, 66, 0, NULL, 0) = 66
select(4, [3], NULL, NULL, {30, 0})     = 1 (in [3], left {29, 893000})
recvfrom(3, "", 9, 0, NULL, NULL)       = 0
write(2, "ERROR: lost connection to server\n", 36ERROR: lost connection to
 server
) = 36

strace of maui during that try gives:
...
select(10, [9], NULL, NULL, {5, 0})     = 1 (in [9], left {5, 0})
recvfrom(9, "00000057\n", 9, 0, NULL, NULL) = 9
select(10, [9], NULL, NULL, {5, 0})     = 1 (in [9], left {5, 0})
recvfrom(9, "CK=4fa43eb400e5e9d7 DT=CMD=showq AUTH=root ARG=0 ALL 0 \n", 57, 0\
, NULL, NULL) = 57
close(9)                                = 0
...


thanks,


stijn

symptom:
submitted jobs stay queued, showq/checkjob commands fail.
the symptoms are not correlated.

the fact that the scheduling didn't work seems due to the following line in my maui.cfg (that i copied from a working setup that was using a previous snapshot):

SYSCFG[base] PLIST=

setting LOGLEVEL to 9 and carefully reading the important messages gave some hints that all teh connetcions to torque were working finem, but that the jobs were held by something else.



(using LOGLEVEL 9):
in /var/log/maui.log:

07/10 16:35:05 INFO:     no PBS sched socket connections ready
07/10 16:35:05 MSUAcceptClient(5,ClientSD,HostName,TCP)
07/10 16:35:05 INFO:     accept call failed, errno: 11 (Resource
temporarily unavailable)
07/10 16:35:05 INFO:     all clients connected.  servicing requests

reading log files more carefully, fd 5 is the listen on port 40559, and the fact that nothing connects to it gives this message. (eg telnet localhost 40559 shows something)



_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to