Hi Sebastien, You make a good point. What I did was issue a warning if the tool found it was being CPU limited vs i/o limited. This indicates the i/o test likely is inaccurate from an i/o perspective, and the results are suspect. It does this crudely by comparing the cpu thread doing stats against the traffic threads doing i/o, which thread is waiting on the others. There is no attempt to assess the cpu load itself. So it's designed with a singular purpose of making sure i/o threads only block on syscalls of write and read.
I probably should revisit this both in design and implementation. Thanks for bringing it up and all input is truly appreciated. Bob On Jan 12, 2023, 12:14 AM, at 12:14 AM, Sebastian Moeller <moell...@gmx.de> wrote: >Hi Bob, > > >> On Jan 11, 2023, at 21:09, rjmcmahon <rjmcma...@rjmcmahon.com> wrote: >> >> Iperf 2 is designed to measure network i/o. Note: It doesn't have to >move large amounts of data. It can support data profiles that don't >drive TCP's CCA as an example. >> >> Two things I've been asked for and avoided: >> >> 1) Integrate clock sync into iperf's test traffic > > [SM] This I understand, measurement conditions can be unsuited for >tight time synchronization... > > >> 2) Measure and output CPU usages > > [SM] This one puzzles me, as far as I understand the only way to >properly diagnose network issues is to rule out other things like CPU >overload that can have symptoms similar to network issues. As an >example, the cake qdisc will if CPU cycles become tight first increases >its internal queueing and jitter (not consciously, it is just an >observation that once cake does not get access to the CPU as timely as >it wants, queuing latency and variability increases) and then later >also shows reduced throughput, so similar things that can happen along >an e2e network path for completely different reasons, e.g. lower level >retransmissions or a variable rate link. So i would think that checking >the CPU load at least coarse would be within the scope of network >testing tools, no? > >Regards > Sebastian > > > > >> I think both of these are outside the scope of a tool designed to >test network i/o over sockets, rather these should be developed & >validated independently of a network i/o tool. >> >> Clock error really isn't about amount/frequency of traffic but rather >getting a periodic high-quality reference. I tend to use GPS pulse per >second to lock the local system oscillator to. As David says, most >every modern handheld computer has the GPS chips to do this already. So >to me it seems more of a policy choice between data center operators >and device mfgs and less of a technical issue. >> >> Bob >>> Hello, >>> Yall can call me crazy if you want.. but... see below [RWG] >>>> Hi Bib, >>>> > On Jan 9, 2023, at 20:13, rjmcmahon via Starlink ><starl...@lists.bufferbloat.net> wrote: >>>> > >>>> > My biggest barrier is the lack of clock sync by the devices, i.e. >very limited support for PTP in data centers and in end devices. This >limits the ability to measure one way delays (OWD) and most assume that >OWD is 1/2 and RTT which typically is a mistake. We know this >intuitively with airplane flight times or even car commute times where >the one way time is not 1/2 a round trip time. Google maps & directions >provide a time estimate for the one way link. It doesn't compute a >round trip and divide by two. >>>> > >>>> > For those that can get clock sync working, the iperf 2 >--trip-times options is useful. >>>> [SM] +1; and yet even with unsynchronized clocks one can try to >measure how latency changes under load and that can be done per >direction. Sure this is far inferior to real reliably measured OWDs, >but if life/the internet deals you lemons.... >>> [RWG] iperf2/iperf3, etc are already moving large amounts of data >>> back and forth, for that matter any rate test, why not abuse some of >>> that data and add the fundemental NTP clock sync data and >>> bidirectionally pass each others concept of "current time". IIRC >(its >>> been 25 years since I worked on NTP at this level) you *should* be >>> able to get a fairly accurate clock delta between each end, and then >>> use that info and time stamps in the data stream to compute OWD's. >>> You need to put 4 time stamps in the packet, and with that you can >>> compute "offset". >>>> > >>>> > --trip-times >>>> > enable the measurement of end to end write to read latencies >(client and server clocks must be synchronized) >>> [RWG] --clock-skew >>> enable the measurement of the wall clock difference between sender >and receiver >>>> [SM] Sweet! >>>> Regards >>>> Sebastian >>>> > >>>> > Bob >>>> >> I have many kvetches about the new latency under load tests >being >>>> >> designed and distributed over the past year. I am delighted! >that they >>>> >> are happening, but most really need third party evaluation, and >>>> >> calibration, and a solid explanation of what network pathologies >they >>>> >> do and don't cover. Also a RED team attitude towards them, as >well as >>>> >> thinking hard about what you are not measuring (operations >research). >>>> >> I actually rather love the new cloudflare speedtest, because it >tests >>>> >> a single TCP connection, rather than dozens, and at the same >time folk >>>> >> are complaining that it doesn't find the actual "speed!". yet... >the >>>> >> test itself more closely emulates a user experience than >speedtest.net >>>> >> does. I am personally pretty convinced that the fewer numbers of >flows >>>> >> that a web page opens improves the likelihood of a good user >>>> >> experience, but lack data on it. >>>> >> To try to tackle the evaluation and calibration part, I've >reached out >>>> >> to all the new test designers in the hope that we could get >together >>>> >> and produce a report of what each new test is actually doing. >I've >>>> >> tweeted, linked in, emailed, and spammed every measurement list >I know >>>> >> of, and only to some response, please reach out to other test >designer >>>> >> folks and have them join the rpm email list? >>>> >> My principal kvetches in the new tests so far are: >>>> >> 0) None of the tests last long enough. >>>> >> Ideally there should be a mode where they at least run to "time >of >>>> >> first loss", or periodically, just run longer than the >>>> >> industry-stupid^H^H^H^H^H^Hstandard 20 seconds. There be dragons >>>> >> there! It's really bad science to optimize the internet for 20 >>>> >> seconds. It's like optimizing a car, to handle well, for just 20 >>>> >> seconds. >>>> >> 1) Not testing up + down + ping at the same time >>>> >> None of the new tests actually test the same thing that the >infamous >>>> >> rrul test does - all the others still test up, then down, and >ping. It >>>> >> was/remains my hope that the simpler parts of the flent test >suite - >>>> >> such as the tcp_up_squarewave tests, the rrul test, and the >rtt_fair >>>> >> tests would provide calibration to the test designers. >>>> >> we've got zillions of flent results in the archive published >here: >>>> >> https://blog.cerowrt.org/post/found_in_flent/ >>>> >> ps. Misinformation about iperf 2 impacts my ability to do this. >>>> > >>>> >> The new tests have all added up + ping and down + ping, but not >up + >>>> >> down + ping. Why?? >>>> >> The behaviors of what happens in that case are really >non-intuitive, I >>>> >> know, but... it's just one more phase to add to any one of those >new >>>> >> tests. I'd be deliriously happy if someone(s) new to the field >>>> >> started doing that, even optionally, and boggled at how it >defeated >>>> >> their assumptions. >>>> >> Among other things that would show... >>>> >> It's the home router industry's dirty secret than darn few >"gigabit" >>>> >> home routers can actually forward in both directions at a >gigabit. I'd >>>> >> like to smash that perception thoroughly, but given our starting >point >>>> >> is a gigabit router was a "gigabit switch" - and historically >been >>>> >> something that couldn't even forward at 200Mbit - we have a long >way >>>> >> to go there. >>>> >> Only in the past year have non-x86 home routers appeared that >could >>>> >> actually do a gbit in both directions. >>>> >> 2) Few are actually testing within-stream latency >>>> >> Apple's rpm project is making a stab in that direction. It looks >>>> >> highly likely, that with a little more work, crusader and >>>> >> go-responsiveness can finally start sampling the tcp RTT, loss >and >>>> >> markings, more directly. As for the rest... sampling TCP_INFO on >>>> >> windows, and Linux, at least, always appeared simple to me, but >I'm >>>> >> discovering how hard it is by delving deep into the rust behind >>>> >> crusader. >>>> >> the goresponsiveness thing is also IMHO running WAY too many >streams >>>> >> at the same time, I guess motivated by an attempt to have the >test >>>> >> complete quickly? >>>> >> B) To try and tackle the validation problem:ps. Misinformation >about iperf 2 impacts my ability to do this. >>>> > >>>> >> In the libreqos.io project we've established a testbed where >tests can >>>> >> be plunked through various ISP plan network emulations. It's >here: >>>> >> https://payne.taht.net (run bandwidth test for what's currently >hooked >>>> >> up) >>>> >> We could rather use an AS number and at least a ipv4/24 and >ipv6/48 to >>>> >> leverage with that, so I don't have to nat the various >emulations. >>>> >> (and funding, anyone got funding?) Or, as the code is GPLv2 >licensed, >>>> >> to see more test designers setup a testbed like this to >calibrate >>>> >> their own stuff. >>>> >> Presently we're able to test: >>>> >> flent >>>> >> netperf >>>> >> iperf2 >>>> >> iperf3 >>>> >> speedtest-cli >>>> >> crusader >>>> >> the broadband forum udp based test: >>>> >> https://github.com/BroadbandForum/obudpst >>>> >> trexx >>>> >> There's also a virtual machine setup that we can remotely drive >a web >>>> >> browser from (but I didn't want to nat the results to the world) >to >>>> >> test other web services. >>>> >> _______________________________________________ >>>> >> Rpm mailing list >>>> >> r...@lists.bufferbloat.net >>>> >> https://lists.bufferbloat.net/listinfo/rpm >>>> > _______________________________________________ >>>> > Starlink mailing list >>>> > starl...@lists.bufferbloat.net >>>> > https://lists.bufferbloat.net/listinfo/starlink >>>> _______________________________________________ >>>> Starlink mailing list >>>> starl...@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/starlink
_______________________________________________ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat