Hi everyone, Tom Barbette: > > Le 22/05/2020 à 20:43, PATRICK KEROULAS a écrit : >>>>>> mlx5 part of libibverbs includes a ts-to-ns converter which takes the >>>>>> instantaneous clock info. It's unused in dpdk so far. I've tested >>>>>> it in the >>>>>> device/port init routine and the result looks reliable. Since this >>>>>> approach >>>>>> looks very simple, compared to the time sync mechanism, I'm trying to >>>>>> integrate. >>>>>> >>>>>> The conversion should occur in the primary process (testpmd) I >>>>>> suppose. >>>>>> 1) The needed clock info derives from ethernet device. Is it >>>>>> possible to >>>>>> access that struct from a rx callback? >>>>>> 2) how to attach the nanosecond to mbuf so that `pdump` catches it? >>>>>> (workaround: copy `mbuf->udata64` in forwarded packets.) >>>>>> 3) any other idea? >>>>> The timestamp is carried in mbuf. >>>>> Then the conversion must be done by the ethdev caller (application or >>>>> any other upper layer). >>>> What if the converter function needs a clock_info? >>>> https://github.com/linux-rdma/rdma-core/blob/7af01c79e00555207dee6132d72e7bfc1bb5485e/providers/mlx5/mlx5dv.h#L1201 >>>> >>>> I'm affraid this info may change by the time the converter is called >>>> by upper layer. >>> Indeed, the clock in the device is not an atomic one :) >>> We need to adjust the time conversion continuously. >>> I am not an expert of time synchronization, so I add more people Cc >>> who could help for having a precise timestamp. >> Thanks Thomas. >> Not sure this is a synchronization issue. We have dedicated processes >> (linuxptp) to keep both NIC and sys clocks in sync with an external >> clock. >> It is "just" a matter of unit conversion. >> >> If it has to be performed in dpdk-pdump, I would need some help to >> retrieve mlx5_clock_info from inside a secondary process. Looking at >> mlx5_read_clock(), this info is extracted from ibv_context which looks >> reachable in a primary process only (segfault, if I try in pdump).
The normal phc2sys can not only synchronise NIC -> system but also sys -> NIC and (I believe it does but have not tried) NIC1 -> NIC2. If I understand your proposal correctly, you want to use a free running NIC counter and calibrate out the drift afterwards. It may be easier to adapt phc2sys to use a NIC through DPDK and sync the NIC's timewheel/VCO in a proven/reliable manner (e.g. low pass filtering excursions). Then you could directly use the NIC counter value. > I don't know about the integrated ts-to-ns, but we implemented a > translation mechanism that mimics what NTP does in Linux to translate a > given clock (TSC at first) to a wall time. You'll find more info at > https://orbi.uliege.be/bitstream/2268/226257/1/thesis.pdf chapter > 3.4.1. This is an often forgotten matter, as we saw in real switches > that the time spent in time-related VDSO is enormous. Do you have measurements of vDSO clock_gettime and how much is "enormous" to you? To my knowledge, clock_gettime via vDSO on Linux only takes a few nanoseconds in the average case. However, it can go up to ~10 or even ~50 microseconds every few (~10) seconds, depending on the number of CPUs (for example single vs. dual socket, though my hardware for this test is quite old, Dell R210-II, R610). Presumably this is when the kernel locks the struct in VVAR to update the TSC drift compensation parameters. Linux clock_gettime implementation is here (different versions): https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/vdso/vclock_gettime.c?h=linux-3.10.y#n193 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/arch/x86/entry/vdso/vclock_gettime.c?h=linux-4.19.y#n241 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/lib/vdso/gettimeofday.c#n98 I use busy waiting on clock_gettime in a packet generator application (so far 10 GbE only) to pace jumbo frames according to a spec (simulating the traffic pattern of a to-be-developed hardware with FPGA), and COTS sniffer hardware with absolute timestamping to verify my generator's performance. I can observe the above 10-50 us artefacts and sufficiently good/low (for my needs) average execution time of clock_gettime. The only sad thing is that TAI clock does not go through vDSO and therefore I cannot use it. > We wanted to do a very precise capture too, se we made that clock able > to synchronize itself with the ConnectX 5 internal clock as a base > instead of TSC. FYI the clock in CX5 si running at 800MHz, so pure > nanosecond is impossible, but close enough. It is for that purpose that > I proposed the rte_eth_read_clock() patch in DPDK. We need to be able to > read the current clock (like rdtsc() instruction for TSC) to compute the > frequency. Doesn't this mean that you need to wait for the PCIe op from the NIC? Is this really faster than a rdtsc, memory/cache read, integer multiplication and shift? Cheers, nicolas