Hi Maurice
To recap your email:
1. The package.lst has the correct entries
2. You saw "No route to host messages"
3. You saw "Erased: Packagename" messages in some log file
4. When running cfmsync -u the log shows the repo IP missing
For item 1:
1. Check the contents of /depot/repos/1001 It must contain all the rpm's
listed in the package.lst. If it does not then run "repoman -u -r ...."
On the nodes truncate the /opt/kusu/etc/package.lst file e.g.
# cat /dev/null > /opt/kusu/etc/package.lst
Then run cfmsync -p That should trigger a retry of the package install.
2. For the no route to host messages, I'm not sure why you are seeing this. Is
there anything unusual in the web servers access_log, or error_log.
3. If this is the yum.log, this can happen if you have the components selected
in ngedit, but the repo does not have them. Ngedit will mark then for removal,
because the repository did not contain them, and the next time cfmsync -p is
run the packages will be removed. It's important to run "repoman -u -r ..."
before ngedit.
4. We logged this one a while ago, but have not had time to look at it.
It's slated to be addressed before the final release.
Mark
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Maurice Hilarius
Sent: Friday, March 14, 2008 5:25 PM
To: [email protected]
Subject: [Kusu-users] cfmsync -p Problem - more details
Here is a list of additional packages we were expecting to
install (from an output of 'ngedit -p compute-centos'):
OpenIPMI OpenIPMI-libs
blas ganglia
ganglia-gmond libtorque
torque torque-mom
torque-pam
Only a single node which we ran for testing this,
we saw in our logs:
(ignore dates/timestamps please)
Mar 04 12:10:57 Erased: OpenIPMI
Mar 04 12:11:01 Erased: OpenIPMI-libs
Mar 04 12:55:00 Erased: blas
Mar 04 12:55:04 Erased: blas
Mar 04 12:55:18 Erased: ganglia-gmond
Mar 04 12:55:38 Erased: torque-mom
Mar 04 12:55:41 Erased: torque-pam
Those entries were interspersed with pieces of informtion such as:
http://192.168.80.50/repos/1001/repodata/repomd.xml: [Errno 4] IOError:
<urlopen error (113, 'No route to host')>
I see No good explanation for this error.
Without that route *nothing* would get installed.
When I look at logs on that machine then I see the file:
http://192.168.80.50/repos/1001/repodata/repomd.xml
This file is perfectly accessible.
We have no idea what process decided to erase those packages, or why.
However the result of this change is very problematic.
In our case, on this cluster, if the packages for 'torque-mom' torque are not
in place, then the moab scheduler cannot work.
On the test node /opt/kusu/sbin/cfmclient appears to consult
/opt/kusu/etc/package.lst. That list looks like that:
# Generated automatically. Do not Edit!
OpenIPMI
OpenIPMI-libs
blas
centos-5-x86_64
component-base-node
component-gnome-desktop
component-nagios-compute-v2_10
ganglia
ganglia-gmond
libtorque
torque
torque-mom
That seems to be ok.
We enabled debugging for cmfclient.
We then tried 'cfmsync -p -n compute-centos', and in /tmp we see:
# cat yum.conf
[main]
cachedir=/var/cache/yum
debuglevel=2
logfile=/var/log/yum.log
reposdir=/dev/null
retries=20
timeout=30
assumeyes=1
tolerant=1
[kusu-installer]
name=centos-5-x86_64 - Booger
baseurl=http://192.168.80.50/repos/1001
That looks ok too, although I would really hope that I would
also hope to see here that gpmcheck=0 woudl also be called for
our extra packages.
But when we try to update:
# cat cfm.log
Updating Packages
++ Testing for: /opt/kusu/cfm/6.package.lst
++ CFMBaseDir: /opt/kusu/cfm
++ NGID = 6
myIPs = [['192.168.80.110', '255.255.255.0']], installers = ['192.168.80.50']
BestIPlist = ['192.168.80.50']
Nothing to remove
Nothing to add
Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh
There is nothing to add because /opt/kusu/cfm/6.package.lst does
not exist.
This is certainly a surprise,
I would hpe that there should be an _attempt_ to add
back those erased packages.
OTOH after 'cfmsync -u -p compute-centos' I get the following in logs:
# cat cfm.log
Updating Packages
++ Testing for: /opt/kusu/cfm/6.package.lst
++ CFMBaseDir: /opt/kusu/cfm
++ NGID = 6
myIPs = [['192.168.80.110', '255.255.255.0']], installers = ['192.168.80.50']
BestIPlist = ['192.168.80.50']
Nothing to remove
Nothing to add
Running plugin: /opt/kusu/lib/plugins/cfmclient/S02KusuAutomount.sh
Running plugin: /opt/kusu/lib/plugins/cfmclient/nrpe.sh
Updating To New Repo Packages
Running: /usr/bin/yum -y -c /tmp/yum.conf update
That might even work, but then I see what is missing,
as this time /tmp/yum.conf shows me this:
[main]
cachedir=/var/cache/yum
debuglevel=2
logfile=/var/log/yum.log
reposdir=/dev/null
retries=20
timeout=30
assumeyes=1
tolerant=1
[kusu-installer]
name=centos-5-x86_64 - Booger
baseurl=http:///repos/1001
With 'baseurl' stated like this it fails.
So, at a minimum I have identified that there is a problem in the generation of
the "baseurl" path.
--
With our best regards,
//Maurice W. Hilarius Telephone: 01-780-456-9771/
/Hard Data Ltd. FAX: 01-780-456-9772/
/11060 - 166 Avenue email:[EMAIL PROTECTED]/
/Edmonton, AB, Canada http://www.harddata.com//
/ T5X 1Y3/
/
_______________________________________________
Kusu-users mailing list
[email protected]
http://mail.osgdc.org/mailman/listinfo/kusu-users
_______________________________________________
Kusu-users mailing list
[email protected]
http://mail.osgdc.org/mailman/listinfo/kusu-users