Kent,
I'm not going to say this with any kind of authority, but there are
two things I'd really look at first - memory issues on the new boxes.
Is there enough RAM? Yes - I know - sort of a dumb question. But
then I'd also take a look at the network. Although PXE booting copies
over SOME files, it's not actually that much data over the network.
On the other hand, the rsync will (if your cpu's are good enough) push
your network gear pretty hard - and you might be suffering from
network issues (packet loss, corruption, bad nic, switch port
failures, etc, etc) during the rsync.
Erik
On Aug 21, 2008, at 2:29 PM, Brodie, Kent wrote:
Hi folks. I’ve been using system imager along the way, dating back
to 2.x on an old itanium cluster.
Anyway… I am in the process of setting up a cluster of new Sun
x2100 servers. And, yet another opportunity for me to set up the
systemimager suite.
Make client image stuff—works great. Set up the golden image…
yeah! GET the image over to the server… excellent – no
problems Boot up the client via PXE-- yep—client gets the boel
stuff, boots up, gets configured, lays out the disks, and then, the
final step—using rsync to pull across the image. Even that STARTS
ok……
This is where I have now spent THREE days trying everything to get
past rsync dying.
When it eventually blows:
On the server side, I see:
2008/08/21 15:58:39 [988] rsync: writefd_unbuffered failed to write
4092 bytes [sender]: Connection reset by peer (104)
2008/08/21 15:58:39 [988] rsync error: error in rsync protocol data
stream (code 12) at io.c(1543) [sender=3.0.2]
After which the rsync connection is toast, and the client rsync dies.
“when” this occurs varies-- sometimes it’s relatively early, other
times it has run rsync for quite some time uninterrupted.
I have been using systemimager for years, and never had this much
trouble. I am at wit’s end, and could really use some help
figuring out how to get past this.
So far, here is what I have tried (none of the following has made a
lick of difference), and some important notes:
Started with systemimager 4.0.2.1 (stable).
Tried going back to older version(s), such as 3.7.3 that I had on my
head node a while back
Tried latest unstable – 4.1.6.1
Upgraded switch (HP procurve) firmware to very latest
Tried “UYOK”, but that fails – client CRASHES failing on hotplug
support somewhere with a null pointer dereference error.
“standard” kernel seems to boot ok.
Upgraded rsync on the server to very latest (3.0.3) -- noted that
with SI 4.1.6.1, rsync is v3.0.2 which should be fine.
Here’s one for you: rsync between the nodes (each direction) works
**PERFECTLY** between the server and the golden client. I am able
to rsync the entire image tree, no problems at all. Both
directions, no issues. It’s ONLY when I have booted a new client
over the network and used systemimager support that the rsync fails-
and then, only after a while…..
Tried booting from the built in Broadcom nic(s) as well as the built
in Nvidia nic(s) on the new client(s). No difference. Same
behavior.
Played with –bwlimit on rsync server side
HELP? I am so frustrated I could scream. I’ve upgraded,
downgraded, tweaked bios options, fiddled with nics, etc etc etc.
I’m stuck.
Servers and clients are RedHat RHEL 4 with kernel 2.6.9-78.
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's
challenge
Build the coolest Linux based applications with Moblin SDK & win
great prizes
Grand prize is a trip for two to an Open Source event anywhere in
the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/_______________________________________________
sisuite-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sisuite-users
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
sisuite-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sisuite-users