Here is a list of additional packages we were expecting to
install (from an output of 'ngedit -p compute-centos'):
OpenIPMI OpenIPMI-libs
blas ganglia
ganglia-gmond libtorque
torque torque-mom
torque-pam
Only a single node which we ran for testing this,
we saw in our logs:
(ignore dates/timestamps please)
Mar 04 12:10:57 Erased: OpenIPMI
Mar 04 12:11:01 Erased: OpenIPMI-libs
Mar 04 12:55:00 Erased: blas
Mar 04 12:55:04 Erased: blas
Mar 04 12:55:18 Erased: ganglia-gmond
Mar 04 12:55:38 Erased: torque-mom
Mar 04 12:55:41 Erased: torque-pam
Those entries were interspersed with pieces of informtion such as:
http://192.168.80.50/repos/1001/repodata/repomd.xml: [Errno 4] IOError: <urlopen
error (113, 'No route to host')>
I see No good explanation for this error.
Without that route *nothing* would get installed.
When I look at logs on that machine then I see the file:
http://192.168.80.50/repos/1001/repodata/repomd.xml
This file is perfectly accessible.
We have no idea what process decided to erase those packages, or why.
However the result of this change is very problematic.
In our case, on this cluster, if the packages for 'torque-mom' torque are not
in place, then the moab scheduler cannot work.
On the test node /opt/kusu/sbin/cfmclient appears to consult
/opt/kusu/etc/package.lst. That list looks like that:
# Generated automatically. Do not Edit!
OpenIPMI
OpenIPMI-libs
blas
centos-5-x86_64
component-base-node
component-gnome-desktop
component-nagios-compute-v2_10
ganglia
ganglia-gmond
libtorque
torque
torque-mom
That seems to be ok.
We enabled debugging for cmfclient.
We then tried 'cfmsync -p -n compute-centos', and in /tmp we see:
# cat yum.conf
[main]
cachedir=/var/cache/yum
debuglevel=2
logfile=/var/log/yum.log
reposdir=/dev/null
retries=20
timeout=30
assumeyes=1
tolerant=1
[kusu-installer]
name=centos-5-x86_64 - Booger
baseurl=http://192.168.80.50/repos/1001
That looks ok too, although I would really hope that I would
also hope to see here that gpmcheck=0 woudl also be called for
our extra packages.
But when we try to update:
# cat cfm.log
Updating Packages
++ Testing for: /opt/kusu/cfm/6.package.lst
++ CFMBaseDir: /opt/kusu/cfm
++ NGID = 6
myIPs = [['192.168.80.110', '255.255.255.0']], installers = ['192.168.80.50']
BestIPlist = ['192.168.80.50']
Nothing to remove
Nothing to add
Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh
There is nothing to add because /opt/kusu/cfm/6.package.lst does
not exist.
This is certainly a surprise,
I would hpe that there should be an _attempt_ to add
back those erased packages.
OTOH after 'cfmsync -u -p compute-centos' I get the following in logs:
# cat cfm.log
Updating Packages
++ Testing for: /opt/kusu/cfm/6.package.lst
++ CFMBaseDir: /opt/kusu/cfm
++ NGID = 6
myIPs = [['192.168.80.110', '255.255.255.0']], installers = ['192.168.80.50']
BestIPlist = ['192.168.80.50']
Nothing to remove
Nothing to add
Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh
Updating To New Repo Packages
Running: /usr/bin/yum -y -c /tmp/yum.conf update
That might even work, but then I see what is missing,
as this time /tmp/yum.conf shows me this:
[main]
cachedir=/var/cache/yum
debuglevel=2
logfile=/var/log/yum.log
reposdir=/dev/null
retries=20
timeout=30
assumeyes=1
tolerant=1
[kusu-installer]
name=centos-5-x86_64 - Booger
baseurl=http:///repos/1001
With 'baseurl' stated like this it fails.
So, at a minimum I have identified that there is a problem in the generation of the
"baseurl" path.
--
With our best regards,
//Maurice W. Hilarius Telephone: 01-780-456-9771/
/Hard Data Ltd. FAX: 01-780-456-9772/
/11060 - 166 Avenue email:[EMAIL PROTECTED]/
/Edmonton, AB, Canada http://www.harddata.com//
/ T5X 1Y3/
/
_______________________________________________
Kusu-users mailing list
[email protected]
http://mail.osgdc.org/mailman/listinfo/kusu-users