user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have "ever edited". Erik's is the number of people whose edits appear in the history of a page at the time of an XML dump. -Aaron On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan <[email protected]> wrote: > I also wonder about this discrepancy. I ran a more explicit version of > Andrew query, trying to eliminate some possible edge cases, and came up > with the same number. > > Now I'm curious. Are there junk rows in our user table, retained for > legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe > the processing you perform to winnow down from 8.2 million? > > J > > On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray <[email protected]> > wrote: > >> Interesting - wonder why my query's giving a higher number? >> >> I agree entirely that we should be very careful with quoting these >> figures. I think you'd probably be safe to say that more than a >> million people have edited... but even then I'd be cautious. >> >> Andrew. >> >> On 27 October 2015 at 11:11, Erik Zachte <[email protected]> wrote: >> > Wikistats has it that 5,644,681 registered accounts published at least >> once till Oct 1, 2015, and 2,181,006 three or more times. >> > It used to publish that on [1][2] but I just removed it. >> > >> > I'm campaigning against us publishing overly inflated counts since >> about two years (Wikimania London). >> > >> > Since this thread is going on and on, I'll repost my (reworded) >> reservations on this particular metric, for newcomers. >> > >> > Even if we state explicitly that this is not unique people, any >> audience will think it may be close and we are overly correct by adding the >> caveat. It may not be so close. For that reason imo such a metric would be >> of questionable value, to put it mildly. >> > >> > Pine: >> >> Is there a way to get counts for the number of accounts, including or >> excluding IPs, that have ever edited English Wikipedia, ? >> > >> > First the anon contributors: when we'd count every ip address that >> shows up in the dumps, we'd count *very* many people who were just >> vandalizing willfully, or just pressing edit for fun, or forgot to login >> once, and also moved from one ip address to another over the years. On top >> of that many people get a new ip address (from a pool) on every session, >> depends on provider policy. >> > >> > As for registered editors the number Wikistats used to publish may be a >> rather empty metric for several reasons: >> > - How many casual editors will have forgotten their password and just >> created a new user id? Only veteran editors know about sockpuppeting and >> how one is supposed not to do that. >> > - How many people will have registered in good faith just out of habit, >> or to tweak presentation preferences, and then played with the edit button >> just to see what happens? Note that roughly 2 out of 3 accounts doesn't >> even reach 3 edits. >> > >> > Cheers, >> > Erik Zachte >> > >> > [1] >> https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution >> > [2] BTW I use the term wikipedians overly inclusive in that report. A >> person who edited once or twice isn't a wikipedian in my book, just like a >> person who writes two post-it notes per month and nothing else isn't called >> a writer. Some terms only apply above some threshold. >> > >> > -----Original Message----- >> > From: Analytics [mailto:[email protected]] On >> Behalf Of Andrew Gray >> > Sent: Tuesday, October 27, 2015 11:06 >> > To: A mailing list for the Analytics Team at WMF and everybody who has >> an interest in Wikipedia and analytics. >> > Subject: Re: [Analytics] User statistics for video marking ENWP 5m >> article milestone >> > >> > To a very crude approximation, there are approximately 8.2 million >> accounts which have at least one edit on English Wikipedia - at least >> assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911 >> > >> > This is all user accounts with one or more edits in the contributions >> record; it does not contain IPs, and it does not contain any accounts whose >> sole contributions have since been deleted (which is probably quite a >> substantial number). Conversely, it includes a vast panoply of single-use >> vandalism accounts, sockpuppets, etc etc etc. And bots, of course. >> > >> > Andrew. >> > >> > On 27 October 2015 at 05:50, Pine W <[email protected]> wrote: >> >> Is there a way to get counts for the number of accounts, including or >> >> excluding IPs, that have ever edited English Wikipedia, ? It would be >> >> preferable to know the number of unique people, but of course that's >> >> impossible. >> >> >> >> Thanks, >> >> Pine >> >> >> >> Aha, that is important for me to know. Thanks Andrew. >> >> >> >> Pine >> >> >> >> >> >> On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray >> >> <[email protected]> >> >> wrote: >> >>> >> >>> On 11 September 2015 at 19:19, James Forrester >> >>> <[email protected]> >> >>> wrote: >> >>> >> >>> >> Does it include editors on all Wikimedia projects >> >>> > >> >>> > No. >> >>> > >> >>> >> or just those who have registered and/or edited on ENWP? >> >>> > >> >>> > Registered, regardless of having edited. >> >>> >> >>> James is of course correct, but one small caveat worth adding: >> >>> because of SUL, a substantial proportion of these will be >> "autocreated" >> >>> accounts from other projects - so even 'registration' may not mean >> >>> what it seems. >> >>> >> >>> -- >> >>> - Andrew Gray >> >>> [email protected] >> >>> >> >>> _______________________________________________ >> >>> Analytics mailing list >> >>> [email protected] >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> >> >> >> >> _______________________________________________ >> >> Analytics mailing list >> >> [email protected] >> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> > >> > >> > >> > -- >> > - Andrew Gray >> > [email protected] >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> > >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> -- >> - Andrew Gray >> [email protected] >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Jonathan T. Morgan > Senior Design Researcher > Wikimedia Foundation > User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
