Hey fellows, When booting hundreds of similar systems at the same time, we need to add some sleeping entropy prior the dhcp & pxe stuff start. That avoid a massive incast problem. To see that point, consider that I'm facing up to 720 similar hardware that boot at the exactly same time. Adding some sleeps prior the dhcp start is a good thing for me.
I've been working on a prototype and faced one big issue. The current random() implementation uses currentick() as seed. But as you guess, the time I need to reach the dhcp is mostly stable over my systems so I have mostly the same results everywhere. So I did use the rtc clock to grab the time and use the last digit of the mac address to increase the entropy. I know the patch isn't perfect, and the cmos code might be moved to the random() thing ... but I preferred submitting a first prototype to rise issues & comments about this strategy. I just have to say this trick worked great on my hosts. Please find bellow my git commit in my personal gpxe repo. Cheers, Erwan ------------------ From: Erwan Velu <erwan.v...@zodiacaerospace.com> Date: Fri, 20 Aug 2010 15:14:44 +0000 (+0200) Subject: MAX_RANDOM_SLEEP_TIME to avoid incast troubles X-Git-Url: http://gitweb.ife-sit.info/?p=gpxe.git;a=commitdiff_plain;h=3b9111e487e45201226b9c3426965ffd843d0687;hp=02a0646fec8011c73f31a83a967873e5fe896575 MAX_RANDOM_SLEEP_TIME to avoid incast troubles When booting hundreds of similar systems at the same time, we need to add some sleeping entropy prior the dhcp & pxe stuff start. By default, gpxe enabled systems will wait up to 30seconds prior booting. --- diff --git a/src/usr/dhcpmgmt.c b/src/usr/dhcpmgmt.c index f82a3bb..97be87f 100644 --- a/src/usr/dhcpmgmt.c +++ b/src/usr/dhcpmgmt.c @@ -20,7 +20,10 @@ FILE_LICENCE ( GPL2_OR_LATER ); #include <string.h> #include <stdio.h> +#include <stdlib.h> #include <errno.h> +#include <unistd.h> +#include <gpxe/io.h> #include <gpxe/netdevice.h> #include <gpxe/dhcp.h> #include <gpxe/monojob.h> @@ -29,6 +32,7 @@ FILE_LICENCE ( GPL2_OR_LATER ); #include <usr/dhcpmgmt.h> #define LINK_WAIT_MS 15000 +#define MAX_RANDOM_SLEEP_TIME 30 /** @file * @@ -56,6 +60,35 @@ int dhcp ( struct net_device *netdev ) { while ( hlen-- ) printf ( "%02x%c", *(chaddr++), ( hlen ? ':' : ')' ) ); + /* In some particular setups like large clusters, many systems can bootup at the same time. + * This could generate a huge load to the main servers, this is know as the incast effect. + * We can avoid this phenomena by introducing a variable sleep time comprised + * between 0 and MAX_RANDOM_SLEEP_TIME. + * To generate random numbers, we grab the time from the cmos powered by the last digit of + * the network card. That's clearly not secured but that's enought for getting entropy at + * boot time. + */ + if (MAX_RANDOM_SLEEP_TIME > 0 ) { + uint8_t random_sleep_time; + + /* Grabbing time from the CMOS */ + uint8_t clock_ctl_addr = 0x70; + uint8_t clock_data_addr = 0x71; + uint8_t cmos_time; + outb (0x80, clock_ctl_addr); + cmos_time=inb (clock_data_addr); + + /* Let's power the cmos time with the last digit of the mac address */ + cmos_time ^= *(--chaddr); + + /* Initialize the random number generator to compute the sleeping time*/ + srandom(cmos_time); + random_sleep_time=random() % MAX_RANDOM_SLEEP_TIME; + + printf ( " \nWaiting %i seconds to avoid incast problems",random_sleep_time); + sleep(random_sleep_time); + } + if ( ( rc = start_dhcp ( &monojob, netdev ) ) == 0 ) { rc = monojob_wait ( "" ); } else if ( rc > 0 ) {
_______________________________________________ gPXE-devel mailing list gPXE-devel@etherboot.org http://etherboot.org/mailman/listinfo/gpxe-devel