This is a bit off-topic for this list, but I was wondering if anyone
has any experience working with ptpd (precision time protocol daemon;
following the IEEE 1588 spec). The point of ptpd is to give better
time precision than NTP; NTP gives accuracy on the order of
miliseconds where ptpd/IEEE 1588 gives accuracy on the order of
microseconds.
http://ptpd.sourceforge.net/
Having coordination accuracy within miliseconds could be quite
helpful to MPI in multiple ways: giving more accurate MPI tracing
outputs, the possibility of scheduling communication (particularly
for MPI collectives) in oversubscribed networks, etc.
I'd like to give ptpd a whirl, but there's very little documentation
and I can't find any mailing lists or other points of contact where
to ask a few questions.
In particular, I would like to run ptpd in a way that I'm guessing
would be fairly common in HPC environments: use NTP to get the time
to my cluster's head node and then use ptpd to synchronize my cluster
to the NTP'ed head node. However, it's not clear to me how ptpd
works -- how do I designate one head node as the "master"? What,
exactly, do all the command line options to ptpd mean? (there's only
a limited "--help" kind of message to explain them) And so on.
I have a busy/active cluster, so I don't want to muck up the clock
(and therefore potentially muck up NFS file timestamps) -- some level
of experimentation is ok, but I don't want to unintentionally cause a
large/bad effect (particularly in terms of NFS) if possible. I'm
also curious as to how much network overhead ptpd incurs, both at
startup and in its steady state operation.
If anyone has any insight or experience with ptpd, I'd love to hear
it. Thanks!
--
Jeff Squyres
Cisco Systems