Adrian Sevcenco wrote:
Greenseid, Joseph M. wrote:
it says ok for when it is starting up. does it not actually start? is
there a maui process running after you do this?
yes, it has a process but when i try to do any command related to maui i
have :
[r...@grid01 log]# checkjob 2
ERROR: lost connection to server
ERROR: cannot request service (status)
I attached the log(9) of starting maui.
Can somebody see the problem there?
Thank you,
Adrian
Adrian, are you running nscd per chance? We have noticed on many of our
clients and servers that the nscd process tends to go haywire from time
to time and cause all sort of problems, including the one you mention.
The tell-tale would be nscd using 100% CPU on your grid01 machine.
Perhaps not your case, but worth checking.
cheers,
Gianfranco
--Joe
------------------------------------------------------------------------
*From:* [email protected] on behalf of Adrian Sevcenco
*Sent:* Mon 12/15/2008 12:56 PM
*To:* [email protected]
*Subject:* [Mauiusers] MAUI not responding - "lost connection to server"
Hi,
I have a strange situation :
when i try to restart the maui server i have :
[r...@grid01 /]# service maui restart
Shutting down MAUI Scheduler: ERROR: lost connection to server
ERROR: cannot request service (status)
[FAILED]
Starting MAUI Scheduler: [ OK ]
The same with firewall down.
as configuration i have this :
[r...@grid01 maui]# cat maui.cfg
# MAUI configuration example
SERVERHOST grid01.spacescience.ro
ADMIN1 root
ADMIN3 edginfo rgma edguser
ADMINHOSTS grid01.spacescience.ro
RMCFG[base] TYPE=PBS
SERVERPORT 40559
SERVERMODE NORMAL
# Set PBS server polling interval. If you have short # queues or/and
jobs it is worth to set a short interval. (10 seconds)
RMPOLLINTERVAL 00:00:10
# a max. 10 MByte log file in a logical location
LOGFILE /var/log/maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 1
# Set the delay to 1 minute before Maui tries to run a job again, # in
case it failed to run the first time.
# The default value is 1 hour.
DEFERTIME 00:01:00
# Necessary for MPI grid jobs
ENABLEMULTIREQJOBS TRUE
Any ideas why it is not working? how can i debug this further?
is there a requirement of something to be in /etc/hosts ?
Thank you,
Adrian
------------------------------------------------------------------------
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
--
Dr. Gianfranco Sciacca Tel: +44 (0)20 7679 3044
Dept of Physics and Astronomy Internal: 33044
University College London D15 - Physics Building
London WC1E 6BT
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers