Re: [Bloat] Updated Bufferbloat Test

Sina Khanifar Wed, 24 Feb 2021 23:33:10 -0800

Thanks for the explanation.

Right now, our criteria for phone/audio is "latency < 300 ms" and
"jitter < 40 ms".


It seems like something along the lines of "95th percentile latency <
300 ms" might be advisable in place of the two existing criteria?


Sina.

On Wed, Feb 24, 2021 at 11:15 PM Simon Barber <[email protected]> wrote:
>
>
>
> On February 24, 2021 9:57:13 PM Sina Khanifar <[email protected]> wrote:
>
>> Thanks for the feedback, Dave!
>>
>>> 0) "average" jitter is a meaningless number. In the case of a 
>>> videoconferencing application, what matters most is max jitter, where the 
>>> app will choose to ride the top edge of that, rather than follow it. I'd 
>>> prefer using a 98% number, rather than 75% number, to weight where the 
>>> typical delay in a videoconfernce might end up.
>>
>>
>> Both DSLReports and Ookla's desktop app report jitter as an average
>> rather than as a max number, so I'm a little hesitant to go against
>> the norm - users might find it a bit surprising to see much larger
>> jitter numbers reported. We're also not taking a whole ton of latency
>> tests in each phase, so the 98% will often end up being the max
>> number.
>>
>> With regards to the videoconferencing, we actually ran some real-world
>> tests of Zoom with various levels of bufferbloat/jitter/latency, and
>> calibrated our "real-world results" table on that basis. We used
>> average jitter in those tests ... I think if we used 98% or even 95%
>> the allowable number would be quite high.
>
>
> Video and audio cannot be played out until the packets have arrived, so late 
> packets are effectively dropped, or the playback buffer must expand to 
> accommodate the most late packets. If the playback buffer expands to 
> accommodate the most late packets then the result is that the whole 
> conversation is delayed by that amount. More than a fraction of a percent of 
> dropped packets results in a very poor video or audio experience, this is why 
> average jitter is irrelevant and peak or maximum latency is the correct 
> measure to use.
>
> Yes, humans can tolerate quite a bit of delay. The conversation is 
> significantly less fluid though.
>
> Simon
>
>
>
>
>
>>
>>> 1) The worst case scenario of bloat affecting a users experience is during 
>>> a simultaneous up and download, and I'd rather you did that rather than 
>>> test them separately. Also you get a more realistic figure for the actual 
>>> achievable bandwidth under contention and can expose problems like strict 
>>> priority queuing in one direction or another locking out further flows.
>>
>>
>> We did consider this based on another user's feedback, but didn't
>> implement it. Perhaps we can do this next time we revisit, though!
>>
>>> This points to any of number of problems (features!) It's certainly my hope 
>>> that all the cdn makers at this point have installed bufferbloat 
>>> mitigations. Testing a cdn's tcp IS a great idea, but as a bufferbloated 
>>> test, maybe not so much.
>>
>>
>> We chose to use a CDN because it seemed like the only feasible way to
>> saturate gigabit links at least somewhat consistently for a meaningful
>> part of the globe, without setting up a whole lot of servers at quite
>> high cost.
>>
>> But we weren't aware that bufferbloat could be abated from the CDN's
>> end. This is a bit surprising to me, as our test results indicate that
>> bufferbloat is regularly an issue even though we're using a CDN for
>> the speed and latency tests. For example, these are the results on my
>> own connection here (Cox, in Southern California), showing meaningful
>> bufferbloat:
>>
>> https://www.waveform.com/tools/bufferbloat?test-id=ece467bd-e07a-45ea-9db6-e64d8da2c1d2
>>
>> I get even larger bufferbloat effects when running the test on a 4G LTE 
>> network:
>>
>> https://www.waveform.com/tools/bufferbloat?test-id=e99ae561-88e0-4e1e-bafd-90fe1de298ac
>>
>> If the CDN was abating bufferbloat, surely I wouldn't see results like these?
>>
>>> 3) Are you tracking an ecn statistics at this point (ecnseen)?
>>
>>
>> We are not, no. I'd definitely be curious to see if we can add this in
>> the future, though!
>>
>> Best,
>>
>> On Wed, Feb 24, 2021 at 2:10 PM Dave Taht <[email protected]> wrote:
>>>
>>>
>>> So I've taken a tiny amount of time to run a few tests. For starters,
>>> thank you very much
>>> for your dedication and time into creating such a usable website, and faq.
>>>
>>> I have several issues though I really haven't had time to delve deep
>>> into the packet captures. (others, please try taking em, and put them
>>> somewhere?)
>>>
>>> 0) "average" jitter is a meaningless number. In the case of a
>>> videoconferencing application,
>>> what matters most is max jitter, where the app will choose to ride the
>>> top edge of that, rather than follow it. I'd prefer using a 98%
>>> number, rather than 75% number, to weight where the typical delay in a
>>> videoconfernce might end up.
>>>
>>> 1) The worst case scenario of bloat affecting a users experience is
>>> during a simultaneous up and download, and I'd rather you did that
>>> rather than test them separately. Also you get
>>> a more realistic figure for the actual achievable bandwidth under
>>> contention and can expose problems like strict priority queuing in one
>>> direction or another locking out further flows.
>>>
>>> 2) I get absurdly great results from it with or without sqm on on a
>>> reasonably modern cablemodem (buffercontrol and pie and a cmts doing
>>> the right things)
>>>
>>> This points to any of number of problems (features!) It's certainly my
>>> hope that all the cdn makers at this point have installed bufferbloat
>>> mitigations. Testing a cdn's tcp IS a great idea, but as a
>>> bufferbloated test, maybe not so much.
>>>
>>> The packet capture of the tcp flows DOES show about 60ms jitter... but
>>> no loss. Your test shows:
>>>
>>> https://www.waveform.com/tools/bufferbloat?test-id=6fc7dd95-8bfa-4b76-b141-ed423b6580a9
>>>
>>> And is very jittery in the beginning of the test on its estimates. I
>>> really should be overjoyed at knowing a cdn is doing more of the right
>>> things, but in terms of a test... and linux also has got a ton of
>>> mitigations on the client side.
>>>
>>> 3) As a side note, ecn actually is negotiated on the upload, if it's
>>> enabled on your system.
>>> Are you tracking an ecn statistics at this point (ecnseen)? It is not
>>> negotiated on the download (which is fine by me).
>>>
>>> I regrettable at this precise moment am unable to test a native
>>> cablemodem at the same speed as a sqm box, hope to get further on this
>>> tomorrow.
>>>
>>> Again, GREAT work so far, and I do think a test tool for all these
>>> cdns - heck, one that tested all of them at the same time, is very,
>>> very useful.
>>>
>>> On Wed, Feb 24, 2021 at 10:22 AM Sina Khanifar <[email protected]> wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> A couple of months ago my co-founder Sam posted an early beta of the
>>>> Bufferbloat test that we’ve been working on, and Dave also linked to
>>>> it a couple of weeks ago.
>>>>
>>>> Thank you all so much for your feedback - we almost entirely
>>>> redesigned the tool and the UI based on the comments we received.
>>>> We’re almost ready to launch the tool officially today at this URL,
>>>> but wanted to show it to the list in case anyone finds any last bugs
>>>> that we might have overlooked:
>>>>
>>>> https://www.waveform.com/tools/bufferbloat
>>>>
>>>> If you find a bug, please share the "Share Your Results" link with us
>>>> along with what happened. We capture some debugging information on the
>>>> backend, and having a share link allows us to diagnose any issues.
>>>>
>>>> This is really more of a passion project than anything else for us –
>>>> we don’t anticipate we’ll try to commercialize it or anything like
>>>> that. We're very thankful for all the work the folks on this list have
>>>> done to identify and fix bufferbloat, and hope this is a useful
>>>> contribution. I’ve personally been very frustrated by bufferbloat on a
>>>> range of devices, and decided it might be helpful to build another
>>>> bufferbloat test when the DSLReports test was down at some point last
>>>> year.
>>>>
>>>> Our goals with this project were:
>>>> * To build a second solid bufferbloat test in case DSLReports goes down 
>>>> again.
>>>> * Build a test where bufferbloat is front and center as the primary
>>>> purpose of the test, rather than just a feature.
>>>> * Try to explain bufferbloat and its effect on a user's connection
>>>> as clearly as possible for a lay audience.
>>>>
>>>> A few notes:
>>>> * On the backend, we’re using Cloudflare’s CDN to perform the actual
>>>> download and upload speed test. I know John Graham-Cunning has posted
>>>> to this list in the past; if he or anyone from Cloudflare sees this,
>>>> we’d love some help. Our Cloudflare Workers are being
>>>> bandwidth-throttled due to having a non-enterprise grade account.
>>>> We’ve worked around this in a kludgy way, but we’d love to get it
>>>> resolved.
>>>> * We have lots of ideas for improvements, e.g. simultaneous
>>>> upload/downloads, trying different file size chunks, time-series
>>>> latency graphs, using WebRTC to test UDP traffic etc, but in the
>>>> interest of getting things launched we're sticking with the current
>>>> featureset.
>>>> * There are a lot of browser-specific workarounds that we had to
>>>> implement, and latency itself is measured in different ways on
>>>> Safari/Webkit vs Chromium/Firefox due to limitations of the
>>>> PerformanceTiming APIs. You may notice that latency is different on
>>>> different browsers, however the actual bufferbloat (relative increase
>>>> in latency) should be pretty consistent.
>>>>
>>>> In terms of some of the changes we made based on the feedback we
>>>> receive on this list:
>>>>
>>>> Based on Toke’s feedback:
>>>> https://lists.bufferbloat.net/pipermail/bloat/2020-November/015960.html
>>>> https://lists.bufferbloat.net/pipermail/bloat/2020-November/015976.html
>>>> * We changed the way the speed tests run to show an instantaneous
>>>> speed as the test is being run.
>>>> * We moved the bufferbloat grade into the main results box.
>>>> * We tried really hard to get as close to saturating gigabit
>>>> connections as possible. We redesigned completely the way we chunk
>>>> files, added a “warming up” period, and spent quite a bit optimizing
>>>> our code to minimize CPU usage, as we found that was often the
>>>> limiting factor to our speed test results.
>>>> * We changed the shield grades altogether and went through a few
>>>> different iterations of how to show the effect of bufferbloat on
>>>> connectivity, and ended up with a “table view” to try to show the
>>>> effect that bufferbloat specifically is having on the connection
>>>> (compared to when the connection is unloaded).
>>>> * We now link from the results table view to the FAQ where the
>>>> conditions for each type of connection are explained.
>>>> * We also changed the way we measure latency and now use the faster
>>>> of either Google’s CDN or Cloudflare at any given location. We’re also
>>>> using the WebTiming APIs to get a more accurate latency number, though
>>>> this does not work on some mobile browsers (e.g. iOS Safari) and as a
>>>> result we show a higher latency on mobile devices. Since our test is
>>>> less a test of absolute latency and more a test of relative latency
>>>> with and without load, we felt this was workable.
>>>> * Our jitter is now an average (was previously RMS).
>>>> * The “before you start” text was rewritten and moved above the start 
>>>> button.
>>>> * We now spell out upload and download instead of having arrows.
>>>> * We hugely reduced the number of cross-site scripts. I was a bit
>>>> embarrassed by this if I’m honest - I spent a long time building web
>>>> tools for the EFF, where we almost never allowed any cross-site
>>>> scripts. * Our site is hosted on Shopify, and adding any features via
>>>> their app store ends up adding a whole lot of gunk. But we uninstalled
>>>> some apps, rewrote our template, and ended up removing a whole lot of
>>>> the gunk. There’s still plenty of room for improvement, but it should
>>>> be a lot better than before.
>>>>
>>>> Based on Dave Collier-Brown’s feedback:
>>>> https://lists.bufferbloat.net/pipermail/bloat/2020-November/015966.html
>>>> * We replaced the “unloaded” and “loaded” language with “unloaded”
>>>> and then “download active”  and “upload active.” In the grade box we
>>>> indicate that, for example, “Your latency increased moderately under
>>>> load.”
>>>> * We tried to generally make it easier for non-techie folks to
>>>> understand by emphasizing the grade and adding the table showing how
>>>> bufferbloat affects some commonly-used services.
>>>> * We didn’t really change the candle charts too much - they’re
>>>> mostly just to give a basic visual - we focused more on the actual
>>>> meat of the results above that.
>>>>
>>>> Based on Sebastian Moeller’s feedback:
>>>> https://lists.bufferbloat.net/pipermail/bloat/2020-November/015963.html
>>>> * We considered doing a bidirectional saturating load, but decided
>>>> to skip on implementing it for now. * It’s definitely something we’d
>>>> like to experiment with more in the future.
>>>> * We added a “warming up” period as well as a “draining” period to
>>>> help fill and empty the buffer. We haven’t added the option for an
>>>> extended test, but have this on our list of backlog changes to make in
>>>> the future.
>>>>
>>>> Based on Y’s feedback (link):
>>>> https://lists.bufferbloat.net/pipermail/bloat/2020-November/015962.html
>>>> * We actually ended up removing the grades, but we explained our
>>>> criteria for the new table in the FAQ.
>>>>
>>>> Based on Greg White's feedback (shared privately):
>>>> * We added an FAQ answer explaining jitter and how we measure it.
>>>>
>>>> We’d love for you all to play with the new version of the tool and
>>>> send over any feedback you might have. We’re going to be in a feature
>>>> freeze before launch but we'd love to get any bugs sorted out. We'll
>>>> likely put this project aside after we iron out a last round of bugs
>>>> and launch, and turn back to working on projects that help us pay the
>>>> bills, but we definitely hope to revisit and improve the tool over
>>>> time.
>>>>
>>>> Best,
>>>>
>>>> Sina, Arshan, and Sam.
>>>> _______________________________________________
>>>> Bloat mailing list
>>>> [email protected]
>>>> https://lists.bufferbloat.net/listinfo/bloat
>>>
>>>
>>>
>>>
>>> --
>>> "For a successful technology, reality must take precedence over public
>>> relations, for Mother Nature cannot be fooled" - Richard Feynman
>>>
>>> [email protected] <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729
>>
>> _______________________________________________
>> Bloat mailing list
>> [email protected]
>> https://lists.bufferbloat.net/listinfo/bloat
>
>
_______________________________________________
Bloat mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] Updated Bufferbloat Test

Reply via email to