Kent,
I'm not going to say this with any kind of authority, but there are two things I'd really look at first - memory issues on the new boxes. Is there enough RAM? Yes - I know - sort of a dumb question. But then I'd also take a look at the network. Although PXE booting copies over SOME files, it's not actually that much data over the network. On the other hand, the rsync will (if your cpu's are good enough) push your network gear pretty hard - and you might be suffering from network issues (packet loss, corruption, bad nic, switch port failures, etc, etc) during the rsync.

Erik


On Aug 21, 2008, at 2:29 PM, Brodie, Kent wrote:

Hi folks. I’ve been using system imager along the way, dating back to 2.x on an old itanium cluster.

Anyway… I am in the process of setting up a cluster of new Sun x2100 servers. And, yet another opportunity for me to set up the systemimager suite.

Make client image stuff—works great. Set up the golden image… yeah! GET the image over to the server… excellent – no problems Boot up the client via PXE-- yep—client gets the boel stuff, boots up, gets configured, lays out the disks, and then, the final step—using rsync to pull across the image. Even that STARTS ok……

This is where I have now spent THREE days trying everything to get past rsync dying.

When it eventually blows:

On the server side, I see:
2008/08/21 15:58:39 [988] rsync: writefd_unbuffered failed to write 4092 bytes [sender]: Connection reset by peer (104) 2008/08/21 15:58:39 [988] rsync error: error in rsync protocol data stream (code 12) at io.c(1543) [sender=3.0.2]

After which the rsync connection is toast, and the client rsync dies.

“when” this occurs varies-- sometimes it’s relatively early, other times it has run rsync for quite some time uninterrupted.

I have been using systemimager for years, and never had this much trouble. I am at wit’s end, and could really use some help figuring out how to get past this.

So far, here is what I have tried (none of the following has made a lick of difference), and some important notes:

Started with systemimager 4.0.2.1 (stable).
Tried going back to older version(s), such as 3.7.3 that I had on my head node a while back
Tried latest unstable – 4.1.6.1
Upgraded switch (HP procurve) firmware to very latest
Tried “UYOK”, but that fails – client CRASHES failing on hotplug support somewhere with a null pointer dereference error. “standard” kernel seems to boot ok. Upgraded rsync on the server to very latest (3.0.3) -- noted that with SI 4.1.6.1, rsync is v3.0.2 which should be fine. Here’s one for you: rsync between the nodes (each direction) works **PERFECTLY** between the server and the golden client. I am able to rsync the entire image tree, no problems at all. Both directions, no issues. It’s ONLY when I have booted a new client over the network and used systemimager support that the rsync fails- and then, only after a while….. Tried booting from the built in Broadcom nic(s) as well as the built in Nvidia nic(s) on the new client(s). No difference. Same behavior.
Played with –bwlimit on rsync server side

HELP? I am so frustrated I could scream. I’ve upgraded, downgraded, tweaked bios options, fiddled with nics, etc etc etc. I’m stuck.

Servers and clients are RedHat RHEL 4  with kernel 2.6.9-78.


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/_______________________________________________
sisuite-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sisuite-users

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
sisuite-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sisuite-users

Reply via email to