Thanks for the detailed response, Gilles! I appreciate your willingness to keep in mind reports from users alongside the image load data we are collecting.
As you suggest, I will ask legal if we can collect email addresses of users who are willing to be contacted for follow up questions, so we can dig in a bit more about their performance issues. I too would rather rely on actual data than anecdotal reports, but I want to make sure that the data is reliable. My own experience continues to show long load times that take seconds, not just milliseconds, on pages like these: https://hu.wikipedia.org/wiki/Wikip%C3%A9dia:A_nap_k%C3%A9pe# For the purposes of calculating total image load from your dashboards, should we still be adding the API and image performance numbers? That would bring our different data points a bit closer to each other. :) I look forward to learning more together about our average users' actual experience, which may require us to calibrate results from different methods until we have a good handle on this. Onward! Fabrice On Apr 21, 2014, at 2:33 AM, Gilles Dubuc <[email protected]> wrote: > Are the stats reliable though? There is a huge jump a few days ago, even in > the file page loading times. Is that when it was switched over to Cloudbees? > > Any data on that graph before Match 18th is junk that came from (often > partial) runs on my laptop, at times on internet connections of very > questionable quality. > > March 18th onwards is exclusively run on cloudbees. You can see right away > that those cloudbees figures are a lot more stable. > > When we have more data in a few days I'll update the SQL query to remove the > misleading figures that came from local development. In fact we should make > sure to avoid running this test locally against mediawiki.org or any > production wiki where EventLogging is turned on from now on, otherwise we'll > pollute the stats. > > > On Mon, Apr 21, 2014 at 3:38 AM, Gergo Tisza <[email protected]> wrote: > On Sun, Apr 20, 2014 at 3:39 AM, Gilles Dubuc <[email protected]> wrote: > Any practical recommendations for addressing this concern? > > Can the users who've been complaining about speed be contacted? That would > allow us to verify whether the bad experience is consistent for them, we > could measure it directly and even compare it to their general internet speed. > > I started a separate thread about that; will also reach out to the users on > hu.wiki. Asking for email addresses in the survey would also be good, but we > should check if it has legal implications (collecting private data can be, > especially in the EU, a painful process). > > And let's not forget that the status quo (opening the File: page) might be > just as slow for those people. They might just not realize it, because most > of the time spent loading that page shows you a blank tab. Now that the > "versus" test has been running on cloudbees for a couple of days, targeting > mediawiki.org, we can see that the file page is slower on average: > http://multimedia-metrics.wmflabs.org/dashboards/mmv#media_viewer_vs_file_page-graphs-tab > That wasn't the case a couple of weeks back, but we've made a number of > improvements since. > > According to those stats, MediaViewer with a warm JS cache beats the file > page 2 to 1. That's pretty impressive! > > Are the stats reliable though? There is a huge jump a few days ago, even in > the file page loading times. Is that when it was switched over to Cloudbees? > > _______________________________________________ > Multimedia mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/multimedia > > > _______________________________________________ > Multimedia mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/multimedia _______________________________ On Apr 20, 2014, at 3:39 AM, Gilles Dubuc <[email protected]> wrote: > Many images still take a much longer time to load in practice, as reported by > beta users around the world > > Anecdotal evidence doesn't invalidate data collected directly by people's web > browsers. People's impression isn't as reliable as the data we're measuring. > The reason why we're collecting data this way is to that we can separate the > facts from the feeling people might have. Since we're talking about an > average, there are undeniably slower loads for certain people (soon shown as > histograms), but I don't see any reason to doubt the averages collected based > on people's comments. > > For a dozen of people who felt the need to comment that it was slow for them, > there could have been hundreds or thousands who were satisfied and didn't say > a thing. In my experience people who are happy or unaffected by something are > a lot less likely to engage with a feedback survey. > > Can we really assume that the mean image load time in India is 691 > milliseconds? > > Yes, that data is very real, for the API map, India's figures are calculated > over 12,209 measured requests, 5,158 unique IP addresses, none of which have > bot-like user agents strings. > > But could there also be some bots or other traffic which could be distorting > the results? > > Bots are valid concern, so I did some digging. Some bots masquerade as real > browsers (not serious search engines like google/yahoo, etc. which make up > most of the bot traffic), but since we're not seeing any non-masquerading > bots at all for India data, I seriously doubt there is any bot traffic at > this time that would impact the results for that country. > > Looking at all countries, I only see 10 hits from a googlebot user agent > string, but with such a low amount it's hard to say if it really is a > googlebot (and not someone/something pretending to be it...). In fact, given > the low bandwidth on those particular hits (24kb/s on an image load that was > a varnish hit) and the fact that their IPs appeared to come from Poland and > Bangladesh, I doubt it was really google. > > While it's undeniable that rural areas one might visit during travels still > suffer from low internet speed, the majority of the world's population now > lives in cities: > http://www.un.org/en/development/desa/population/publications/urbanization/urban-rural.shtml > and the average broadband speed worldwide is probably much higher nowadays > than most people think: http://www.netindex.com/ And dial-up is rapidly > disappearing: > http://www.pewinternet.org/data-trend/internet-use/connection-type/ Slow > internet speed is a reality for a lot of people, but not for the majority of > people. I'm not surprised by the average results we're seeing. I agree that > this rapid change in recent years can be counter-intuitive when you're used > to traveling to rural locations. > > Any practical recommendations for addressing this concern? > > Can the users who've been complaining about speed be contacted? That would > allow us to verify whether the bad experience is consistent for them, we > could measure it directly and even compare it to their general internet speed. > > As far as performance and stats improvements are concerned, we've been over > it several times and I think everything that could be done is already > implemented, filed or on its way. > > And let's not forget that the status quo (opening the File: page) might be > just as slow for those people. They might just not realize it, because most > of the time spent loading that page shows you a blank tab. Now that the > "versus" test has been running on cloudbees for a couple of days, targeting > mediawiki.org, we can see that the file page is slower on average: > http://multimedia-metrics.wmflabs.org/dashboards/mmv#media_viewer_vs_file_page-graphs-tab > That wasn't the case a couple of weeks back, but we've made a number of > improvements since. > > That's why I think it's important to do some real measurements on users that > bring up this issue. If we're not already doing it, we should encourage them > to optionally enter their email address for the purpose of investigating > issues further. > > > On Sat, Apr 19, 2014 at 9:44 PM, Fabrice Florin <[email protected]> wrote: > Thanks to everyone for this great teamwork! > > The updated geographical performance dashboards which Gilles and Mark just > posted paint a more optimistic picture than before, which is encouraging: > http://multimedia-metrics.wmflabs.org/dashboards/mmv#geographical_network_performance-graphs-tab > > However, these extremely fast load times do not match what we are hearing > from our users — or even our own experience on slower connections. Many > images still take a much longer time to load in practice, as reported by beta > users around the world, from Brazil to Hungary. > > Can we really assume that the mean image load time in India is 691 > milliseconds? Seems way too fast, based on my experience traveling in Asia a > few weeks ago — where images could take a very long time to load, if at all. > > As Gergo pointed out, these early results may be because our first beta > testers may have some faster connections than average users. But could there > also be some bots or other traffic which could be distorting the results? > > I know that we are working next on histograms that will give us a better > sense of how outliers are performing against average users. Can’t wait for > that. > > But I am still concerned that this chart may be painting a much rosier > picture than what’s actually going on in the real world. > > Any practical recommendations for addressing this concern? We want to know > what’s really happening for average users, so we can determine whether or not > regions with slow connections like India should consider making this feature > opt-in, rather than opt-out. > > Thanks again to you all for helping us gain more clarity on this critical > issue :) > > > Fabrice > > > On Apr 18, 2014, at 11:16 AM, Gilles Dubuc <[email protected]> wrote: > >> Mark deployed the change, the mean and standard deviation on the "Overall >> network performance" and "Geographical network performance" tabs are now >> geometric: >> >> http://multimedia-metrics.wmflabs.org/dashboards/mmv >> >> These charts and maps now make a lot more sense! Next I'll be working on >> distribution histograms, so that we can see the outlier values that are now >> excluded from those graphs. >> >> Thanks again Aaron, thanks to you these visualizations have become truly >> useful and meaningful, in the way they were meant to be. >> >> >> On Thu, Apr 17, 2014 at 6:13 PM, Aaron Halfaker <[email protected]> >> wrote: >> Yikes! Good catch. >> >> >> On Thu, Apr 17, 2014 at 11:12 AM, Gilles Dubuc <[email protected]> wrote: >> A solution to this problem is to generate a geometric mean[2] instead. >> >> Thanks a lot for the help, it literally instantly solved my problem! >> >> There was a small mistake in the order of functions in your example, for the >> record it should be: >> >> EXP(AVG(LOG(event_total))) AS geometric_mean >> >> And conveniently the geometric standard deviation can be calculated the same >> way: >> >> EXP(STDDEV(LOG(event_total))) AS geometric_stddev >> >> I put it to the test on a specific set of data where we had a huge outlier, >> and for that data it seems equivalent to excluding the lower and upper 10 >> percentiles, which is exactly what I was after. >> >> >> >> >> >> On Wed, Apr 16, 2014 at 4:24 PM, Aaron Halfaker <[email protected]> >> wrote: >> Hi Gilles, >> >> I think I know just the thing you're looking for. >> >> It turns out that much of this performance data is log-normally >> distributed[1]. Log-normal distributions tend to have a hockey stick >> shape where most of the values are close to zero, but occasionally very >> large values appear[3]. Taking the mean of a log-normal distributions tend >> to be sensitive to outliers like the ones you describe. >> >> A solution to this problem is to generate a geometric mean[2] instead. One >> convenient thing about log-normal data is that if you log() it, it becomes >> normal[4] -- and not sensitive to outliers in the usual way. Also >> convenient, geometric means are super easy to generate. All you need to do >> is this: (1) pass all of the data through log() (2) pass the same data >> through mean() (or avg() -- whatever) (3) pass the result through exp(). >> The best thing about this is that you can do it in MySQL. >> >> For example: >> >> SELECT >> country, >> mean(timings) AS regular_mean, >> exp(log(mean(timings)) AS geomteric_mean >> FROM log.WhateverSchemaYouveGot >> GROUP BY country >> >> 1. https://en.wikipedia.org/wiki/Log-normal_distribution >> 2. https://en.wikipedia.org/wiki/Geometric_mean >> 3. See distribution.log_normal.svg (24K) >> 4. See distribution.log_normal.logged.svg (33K) >> >> -Aaron >> >> On Wed, Apr 16, 2014 at 8:42 AM, Dan Andreescu <[email protected]> >> wrote: >> So, my latest idea for a solution is to write a python script that will >> import the section (last X days) of data from the EventLogging tables that >> we're interested in into a temporary sqlite database, then proceed with >> removing the upper and lower percentiles of the data, according to any >> column grouping that might be necessary. And finally, once the data >> preprocessing is done in sqlite, run similar queries as before to export the >> mean, standard deviation, etc. for given metrics to tsvs. I think using >> sqlite is cleaner than doing the preprocessing on db1047 anyway. >> >> It's quite an undertaking, it basically means rewriting all our current SQL >> => TSV conversion. The ability to use more steps in the conversion means >> that we'd be able to have simpler, more readable SQL queries. It would also >> be a good opportunity to clean up the giant performance query with a >> bazillion JOINS: >> https://gitorious.org/analytics/multimedia/source/a949b1c8723c4c41700cedf6e9e48c3866e8b2f4:perf/template.sql >> which can actually be divided into several data sources all used in the >> same graph. >> >> Does that sound like a good idea, or is there a simpler solution out there >> that someone can think of? >> >> Well, I think this sounds like we need to seriously evaluate how people are >> using EventLogging data and provide this sort of analysis as a feature. >> We'd have to hear from more people but I bet it's the right thing to do long >> term. >> >> Meanwhile, "simple" is highly subjective here. If it was me, I'd clean up >> the indentation of that giant SQL query you have, then maybe figure out some >> ways to make it faster, then be happy as a clam. So if sql-lite is the tool >> you feel happy as a clam with, then that sounds like a great solution. >> Alternatives would be python, php, etc. I forgot if pandas was allowed >> where you're working but that's a great python library that would make what >> you're talking about fairly easy. >> >> Another thing for us to seriously consider is PostgreSQL. This has proper >> f-ing temporary tables and supports actual people doing actual work with >> databases. We could dump data, especially really simple schemas like >> EventLogging, into PostgreSQL for analysis. >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> _______________________________________________ >> Multimedia mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/multimedia > > _______________________________ > > Fabrice Florin > Product Manager > Wikimedia Foundation > > http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF) > > > > > _______________________________________________ > Multimedia mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/multimedia > > > _______________________________________________ > Multimedia mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/multimedia _______________________________ Fabrice Florin Product Manager Wikimedia Foundation http://en.wikipedia.org/wiki/User:Fabrice_Florin_(WMF)
_______________________________________________ Multimedia mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/multimedia
