On 7/18/07, Patrick Ohly <[EMAIL PROTECTED]> wrote:
* Have you heard of PTP and considered to use it in clusters? * How would applications or clusters benefit from a better cluster-wide clock? * What obstacles did or could prevent using PTP(d) for that purpose?
I wasn't aware of PTPd, and neither was my team leader Josip Loncaric. Now that I am, I'll try it out and compare it to a solution Josip came up with a while ago. He was satisfied with NTP at one time, but started writing BTime (http://btime.sf.net) in 2004. He listed some of his reasons in a talk at SC2005 (slides are in the BTime tarball): Benefits of precision global timekeeping Rapid/frequent performance measurements of parallel applications · Local gettimeofday() better than communicating with a global clock Synchronous system activity possible without communication · Local timer triggered events could be globally synchronous · Potential for reducing the impact of system noise BTime synchronizes client clocks to server broadcasts (not multicast), and uses a kernel module to provide more precise time-relevant data. The current version of BTime applies to Linux kernels 2.6.13 up to and including 2.6.17, but Josip hasn't had time to get it working with the new clocksource infrastructure of newer kernels. More details from the README: TUNING: BTime assumes that a certain fraction of timestamps will make it with minimal delays, and that those minimal delays are exponentially distributed. Over a high performance local network using UDP protocol, this characteristic noise is empirically about 3 microseconds (it would be about 10 microseconds for TCP), but if the network path has several hops, timing noise could be higher (e.g. 25 us UDP). BTW, BTime adaptively estimates the probability of receiving timestamps without extra delays; but it currently requires a fixed timing noise estimate. TO DO: broadcast delay compensation... for now, BTime synchronizes all clients to server time minus uncompensated broadcast delay B. This delay (about 35-50 microseconds) can be measured more precisely to improve compensation. Otherwise, the server will remain B microseconds ahead of all clients, which will be synchronized with each other. BTime currently applies the same compensation constant at each client. Quality of this synchronization depends on the consistency in the minimum broadcast time, but client clocks usually remain within 10 microseconds of each other. Even over a noisy 4-hop network, 10 us tolerance was reached with 99.75% confidence in my tests at 1 timestamp per second. Finally, btime-server is set to send a small UDP packet once per second. This imposes very low overhead (<0.001% of CPU and network) but the interval could be increased up to 30 seconds or so, at the expense of increasing the width of the confidence interval. In my tests (using TCP with about 10 us timing noise over private GigE), client clocks track the median offset within about 10 microseconds * sqrt(seconds between good timestamps) as the 99.9% confidence interval. This assumes that the master clock is synchronized to wall clock time once per day; but if NTP is running, it applies >100 times larger adjustments roughly every 17 minutes, and while BTime quickly compensates, these transients reduce the confidence of staying within the same tracking interval. -- Andrew Shewmaker _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf