Thx very much for taking a look at this. Another interesting test is to put a workload on the box (flent's rrul test is what I use, but a couple netperf or iperf's in both directions is sufficient) and see what happens with fifo and fq_codel. Now that you got my (crappy) first try working...
The measurements you get are about what I got using different methods ages ago - that we cannot trust a kernel to userspace transition, on bare x86 metal - to much below 250us. Containers/vms are worse, you can do mildly better with a R/T kernel, and I expect mips to be abysmal. But I can go try that to see what happens. Arm (particularly multicore arm), I have no idea, the context switch overhead pre-speculation arm chips was demonstrably lower than x86, and context switch overhead got much worse on everything post the spectre CVEs. There's another kernel setsockopt nowadays that might be useful to set a pacing rate, so far as I recall that got made to work with udp around 4.12 in support of quic and bbr. Regardless, I don't think nanosec resolution is needed, but I still think the usec resolution could be useful on short-rtt metrics, partially as a measurement of congestive or cpu overload. A full size packet is 13ms at 1mbit, 13us at a gbit to transit the link. Wifi is 700us to grab the media, and we typically have two txops of up to 5.3ms in size stacked up. So some differentiation as to quality here is possible... On Thu, Jul 25, 2019 at 9:52 AM Baptiste Jonglez <[email protected]> wrote: > > Hello, > > A recent discussion with Dave convinced me to start looking at whether > very short RTTs make any sense in Babel, and whether they could be used to > infer link speed. If only to settle theses questions for good. Since the > related subject of nanosecond-resolution timestamps was brought up by Toke > at the IETF session yesterday, I made a quick test with kernel timestamps > today. > > I'll talk about the implementation and shortcomings of using kernel > timestamps below, but here are some rough timing results. I just used a > veth pair on my laptop (4.17 kernel) with a babeld on each side of the > pair, and didn't do any serious statistics. > > - regular babeld: average measured RTT ~320 µs (quite variable) > > - babeld with kernel RX timestamps: average measured RTT ~120 µs (quite > variable) > > - ping through the same veth pair (link-local IPv6, 1000 packets): average > 105 µs, minimum 16 µs > > The observant reader will notice that the current resolution (1 µs) is > more than enough in that case. Also, using kernel timestamps improves > accuracy by about 100 µs on each host, a somewhat significant improvement. > > > Now, regarding the implementation of kernel timestamps in babeld, my test > code is here: > > > https://github.com/jonglezb/babeld/commit/56756a8cbe9a0b8a168c78873dd77e48e5770278 > > Thank you Dave for your first draft of this code, I borrowed a bit from it ;) > > As explained in the commit message, it suffers from a number of issues > that would need some serious work before it's really usable: > > - kernel timestamps use the realtime clock, and it's not configurable. > This is really annoying because babeld uses the monotonic clock (for > good reasons). Using kernel timestamps forces us to fall back to the > realtime clock elsewhere in babeld, and it would require some work to do > it cleanly. > > - kernel timestamps are only used for received packets. The sending side > (timestamp in Hello) still uses userspace timestamps, with the ensuing > accuracy issue. It does not seem possible to tell the kernel to embed a > timestamp at a specified location in a packet just before sending it, > unless maybe playing with eBPF. > > > Takeaway: at least on Linux, I don't see a use-case for nanosecond > resolution timestamps. If somebody ever writes an implementation of Babel > for specialized hardware and runs a datacenter with ultra-low-latency > network equipments, it could possibly still make sense. > > > -- > Baptiste Jonglez > PhD student > Univ. Grenoble Alpes <https://www.univ-grenoble-alpes.fr/> > LIG lab <https://www.liglab.fr/> > Drakkar team <http://drakkar.imag.fr/> | Polaris team at INRIA > <https://team.inria.fr/polaris/> > _______________________________________________ > babel mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/babel -- Dave Täht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740 _______________________________________________ Babel-users mailing list [email protected] https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/babel-users
