Just to complicate this more... since Pine's question was "accounts... that have ever edited English Wikipedia", we might consider restricting our counts to namespace 0 only.
Either way, it's safe to say that the total number is in the millions. J On Tue, Oct 27, 2015 at 10:21 AM, Erik Zachte <[email protected]> wrote: > Yes we were talking about editor counts, then we moved on to countries, > that's what one calls an analogy ;-) > > > > Erik > > > > *From:* Analytics [mailto:[email protected]] *On > Behalf Of *Aaron Halfaker > *Sent:* Tuesday, October 27, 2015 18:19 > > *To:* A mailing list for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and analytics. > *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking > ENWP 5m article milestone > > > > > 800+ wikis and 280+ Wikipedias > > I thought we were talking about editor counts here. > > > > On Tue, Oct 27, 2015 at 12:18 PM, Erik Zachte <[email protected]> > wrote: > > Aaron, the example of countries doesn't seem fitting for me. In many > bodies like UN all countries have one vote, and small countries are > disproportionaly powerful. That's part of why 196 has meaning. > > > > Let me put it this way, if you think our 800+ wikis and 280+ Wikipedias > story is not misleading, then we're pretty far apart on what constitutes > meaningful communication. > > > > Erik > > > > > > *From:* Analytics [mailto:[email protected]] *On > Behalf Of *Leila Zia > *Sent:* Tuesday, October 27, 2015 18:11 > > > *To:* A mailing list for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and analytics. > *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking > ENWP 5m article milestone > > > > > > On Tue, Oct 27, 2015 at 9:56 AM, Aaron Halfaker <[email protected]> > wrote: > > > > If we want to critique how we communicate about something, we can't do it > in such general terms as "use 5+ edits". We need to know what meaning is > intended to be expressed. Only within the context of "meaning" can we talk > about "deception" and "misunderstanding". As an empiricist, I'd like to > challenge the speculation about the low competencies of our audience. > > > > For the purpose of the 5M report, our audience is a very large audience > coming from very different walks of life, the report will be translated in > many languages and will be read worldwide. We are not challenging the > competency of our audience, instead we are trying to find a way to assist > more of our audience to hear a story closer to the real story. > > > > So, if we're going to communicate how people contribute to Wikipedia and > not "mislead", we're going to need to give people a primer on powerlaws of > participation and discuss the implications of the best fit pareto index > <https://en.wikipedia.org/wiki/Pareto_index> for Wikipedia edits. > > > > That's one option, but that's too hard to the extent that is impossible. > The suggestion is that we do better with the understanding that many people > will still not get the full picture, but many more will know a story that > is closer to the reality of Wikipedia. > > Leila > > > > > > > > -Aaron > > > > > > On Tue, Oct 27, 2015 at 11:03 AM, Erik Zachte <[email protected]> > wrote: > > I do agree that we reject good contributions. I also agree this is a messy > filter. > > > > The main point however is do we want to communicate to the general public > using such messy, fuzzy, inflated (partially), hard to not misunderstand > numbers? > > We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias). > Not untrue in some very formal sense, but totally misleading in that they > play on expectations which are totally false. > > > > Erik > > > > *From:* Analytics [mailto:[email protected]] *On > Behalf Of *Aaron Halfaker > *Sent:* Tuesday, October 27, 2015 16:41 > > > *To:* A mailing list for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and analytics. > *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking > ENWP 5m article milestone > > > > I don't agree. There are a lot of good-faith page creations that get > deleted every day. There are also many edits that get reverted. Arguably, > those edits aren't productive either, but they don't disappear from the > dumps like article drafts do. This is a messy filter at best. > > > > On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte <[email protected]> > wrote: > > As Aaron says. I'd like to add that if almost 3 million accounts > disappeared from the dumps alltogether (vandals? school kids?) that makes > the case for not using such a count even more convincing. > > > > Erik > > > > *From:* Analytics [mailto:[email protected]] *On > Behalf Of *Aaron Halfaker > *Sent:* Tuesday, October 27, 2015 15:48 > *To:* A mailing list for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and analytics. > *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking > ENWP 5m article milestone > > > > user_editcount includes edits to deleted pages and revdeleted edits. > Erik's perl scripts use the XML dumps that do not include edits to deleted > pages. > > Strictly speaking, user_editcount is a better proxy for the number of > people who have "ever edited". Erik's is the number of people whose edits > appear in the history of a page at the time of an XML dump. > > -Aaron > > > > On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan <[email protected]> > wrote: > > I also wonder about this discrepancy. I ran a more explicit version of > Andrew query, trying to eliminate some possible edge cases, and came up > with the same number. > > > > Now I'm curious. Are there junk rows in our user table, retained for > legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe > the processing you perform to winnow down from 8.2 million? > > > > J > > > > On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray <[email protected]> > wrote: > > Interesting - wonder why my query's giving a higher number? > > I agree entirely that we should be very careful with quoting these > figures. I think you'd probably be safe to say that more than a > million people have edited... but even then I'd be cautious. > > Andrew. > > > On 27 October 2015 at 11:11, Erik Zachte <[email protected]> wrote: > > Wikistats has it that 5,644,681 registered accounts published at least > once till Oct 1, 2015, and 2,181,006 three or more times. > > It used to publish that on [1][2] but I just removed it. > > > > I'm campaigning against us publishing overly inflated counts since about > two years (Wikimania London). > > > > Since this thread is going on and on, I'll repost my (reworded) > reservations on this particular metric, for newcomers. > > > > Even if we state explicitly that this is not unique people, any audience > will think it may be close and we are overly correct by adding the caveat. > It may not be so close. For that reason imo such a metric would be of > questionable value, to put it mildly. > > > > Pine: > >> Is there a way to get counts for the number of accounts, including or > excluding IPs, that have ever edited English Wikipedia, ? > > > > First the anon contributors: when we'd count every ip address that shows > up in the dumps, we'd count *very* many people who were just vandalizing > willfully, or just pressing edit for fun, or forgot to login once, and also > moved from one ip address to another over the years. On top of that many > people get a new ip address (from a pool) on every session, depends on > provider policy. > > > > As for registered editors the number Wikistats used to publish may be a > rather empty metric for several reasons: > > - How many casual editors will have forgotten their password and just > created a new user id? Only veteran editors know about sockpuppeting and > how one is supposed not to do that. > > - How many people will have registered in good faith just out of habit, > or to tweak presentation preferences, and then played with the edit button > just to see what happens? Note that roughly 2 out of 3 accounts doesn't > even reach 3 edits. > > > > Cheers, > > Erik Zachte > > > > [1] > https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution > > [2] BTW I use the term wikipedians overly inclusive in that report. A > person who edited once or twice isn't a wikipedian in my book, just like a > person who writes two post-it notes per month and nothing else isn't called > a writer. Some terms only apply above some threshold. > > > > -----Original Message----- > > From: Analytics [mailto:[email protected]] On > Behalf Of Andrew Gray > > Sent: Tuesday, October 27, 2015 11:06 > > To: A mailing list for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and analytics. > > Subject: Re: [Analytics] User statistics for video marking ENWP 5m > article milestone > > > > To a very crude approximation, there are approximately 8.2 million > accounts which have at least one edit on English Wikipedia - at least > assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911 > > > > This is all user accounts with one or more edits in the contributions > record; it does not contain IPs, and it does not contain any accounts whose > sole contributions have since been deleted (which is probably quite a > substantial number). Conversely, it includes a vast panoply of single-use > vandalism accounts, sockpuppets, etc etc etc. And bots, of course. > > > > Andrew. > > > > On 27 October 2015 at 05:50, Pine W <[email protected]> wrote: > >> Is there a way to get counts for the number of accounts, including or > >> excluding IPs, that have ever edited English Wikipedia, ? It would be > >> preferable to know the number of unique people, but of course that's > >> impossible. > >> > >> Thanks, > >> Pine > >> > >> Aha, that is important for me to know. Thanks Andrew. > >> > >> Pine > >> > >> > >> On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray > >> <[email protected]> > >> wrote: > >>> > >>> On 11 September 2015 at 19:19, James Forrester > >>> <[email protected]> > >>> wrote: > >>> > >>> >> Does it include editors on all Wikimedia projects > >>> > > >>> > No. > >>> > > >>> >> or just those who have registered and/or edited on ENWP? > >>> > > >>> > Registered, regardless of having edited. > >>> > >>> James is of course correct, but one small caveat worth adding: > >>> because of SUL, a substantial proportion of these will be "autocreated" > >>> accounts from other projects - so even 'registration' may not mean > >>> what it seems. > >>> > >>> -- > >>> - Andrew Gray > >>> [email protected] > >>> > >>> _______________________________________________ > >>> Analytics mailing list > >>> [email protected] > >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >> > >> > >> > >> _______________________________________________ > >> Analytics mailing list > >> [email protected] > >> https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > > > > > > > -- > > - Andrew Gray > > [email protected] > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > -- > - Andrew Gray > [email protected] > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > > Jonathan T. Morgan > > Senior Design Researcher > > Wikimedia Foundation > > User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
