Hi Dan, Thanks for running these!
I'm struck by the figure of 12.8m pages in ns0 - it looks like this includes redirects (there are ~7.6m ns0 redirects on enwiki, and ~5.2m articles). This will probably skew things a lot, as the majority of those will probably be edited once and never touched again, barring the target page being moved,. Given they're ~60% of the pages, this will introduce a lot of extra weight for "articles with very few edits" and "articles that get edited very infrequently". It might be worth trying to filter out redirects - I suspect this would have a noticeable effect on both the distribution and the mean time between edits. Andrew. On 14 September 2016 at 22:01, Dan Andreescu <[email protected]> wrote: > Quick follow up 'cause I was curious. I calculated the average and standard > deviation for edits per namespace 0 article on enwiki. I tried to do it on > the research db replicas but it took forever so I did it on the hadoop > cluster. Including archived pages isn't useful, doesn't change the results > almost at all. Including pages outside namespace 0 increases the standard > deviation and decreases the average. Here are the results: > > 484,170,218 edits on namespace 0 > 12,756,342 pages in namespace 0 > > standard deviation for edits per page: 213.58 > average edits per page: 38.02 > average days between first and last edit per page: 1215.27 > > So considering the standard deviation is much larger than the mean, I'm > pretty confident to answer yes, I think the vast majority of articles in > namespace 0 on enwiki get very few edits. The dataset we're working on > releasing as part of wikistats 2.0 will allow these kinds of questions to be > answered really easily and really quickly. Stay tuned over the next few > quarters :) > > And the queries: > https://gist.github.com/milimetric/8b5f447e3ef09b6fe4384e0f75cc0b34 > > If you want to edit those queries to find something else out, I'm happy to > run them one or two more times, but then I really have to get back to my > real job :) > > On Wed, Sep 7, 2016 at 12:42 PM, Andrew Gray <[email protected]> > wrote: >> >> Hi Reem, >> >> Here's some rough estimates. >> >> English - https://stats.wikimedia.org/EN/TablesWikipediaEN.htm >> >> English has ~5.2 million articles, with an average of ~92 edits per >> article, not counting deleted edits (or deleted articles). Note that 80% of >> those articles are more than three years old, so they've had plenty of time >> to build up the 92 edits. >> >> [The page does not explicitly say that only article edits are counted in >> the tables, but this is easy to confirm - >> https://en.wikipedia.org/wiki/Wikipedia:Statistics has 847m edits] >> >> Arabic - https://stats.wikimedia.org/EN/TablesWikipediaAR.htm >> >> Arabic has ~437k articles, ~31 edits/article - but only half of these are >> more than three years old, so they're on average a lot younger than the >> English ones. >> >> As of July there are 3.3m edits/month in English - this is equal to an >> average of 0.63 edits/article/month - and 226k edits/month in Arabic, equal >> to 0.52 edits/article/month. July was a slow month for Arabic, and March had >> more than twice as many edits, 487k, across 415k articles. >> >> These are plain averages. The distribution is going to be very skewed, so >> high-edit articles get most of the attention, and the other articles easily >> go months without attention. If we assume an 80:20 distribution - which is a >> wild guess but sounds plausible - then the "long tail" of 80% of articles >> would get 20% of the edits. In this case, a plausible average would be: >> >> * English long tail, 4.16m articles and 660k edits/month = average of six >> months between each edit >> * Arabic (July) long tail, 350k articles and 45k edits/month = average of >> seven or eight months between each edit >> * Arabic (March) long tail, 332k articles and 97k edits/month = average of >> three and a half months between each edit >> >> This is a broad range, but it feels more or less right for all those >> unloved pages... >> >> Andrew. >> >> >> On 7 September 2016 at 14:52, Reem Al-Kashif <[email protected]> >> wrote: >> > Hi, >> > >> > I always hear people saying that most of the articles usually receive >> > little >> > to no edits (and that is used to encourage participants to make sure >> > their >> > articles are good enough). I would like to know if there are statistics >> > that >> > support this for the English and Arabic Wikipedia. >> > >> > Best, >> > Reem >> > >> > -- >> > Kind regards, >> > Reem Al-Kashif >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> > >> >> >> >> -- >> - Andrew Gray >> [email protected] >> >> >> -- >> - Andrew Gray >> [email protected] >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- - Andrew Gray [email protected] _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
