Just to complicate this more...

since Pine's question was "accounts... that have ever edited English
Wikipedia", we might consider restricting our counts to namespace 0 only.

Either way, it's safe to say that the total number is in the millions.

J

On Tue, Oct 27, 2015 at 10:21 AM, Erik Zachte <[email protected]> wrote:

> Yes we were talking about editor counts, then we moved on to countries,
> that's what one calls an analogy ;-)
>
>
>
> Erik
>
>
>
> *From:* Analytics [mailto:[email protected]] *On
> Behalf Of *Aaron Halfaker
> *Sent:* Tuesday, October 27, 2015 18:19
>
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking
> ENWP 5m article milestone
>
>
>
> > 800+ wikis and 280+ Wikipedias
>
> I thought we were talking about editor counts here.
>
>
>
> On Tue, Oct 27, 2015 at 12:18 PM, Erik Zachte <[email protected]>
> wrote:
>
> Aaron, the example of countries doesn't seem fitting for me. In many
> bodies like UN all countries have one vote, and small countries are
> disproportionaly powerful. That's part of why 196 has meaning.
>
>
>
> Let me put it this way, if you think our 800+ wikis and 280+ Wikipedias
> story is not misleading, then we're pretty far apart on what constitutes
> meaningful communication.
>
>
>
> Erik
>
>
>
>
>
> *From:* Analytics [mailto:[email protected]] *On
> Behalf Of *Leila Zia
> *Sent:* Tuesday, October 27, 2015 18:11
>
>
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking
> ENWP 5m article milestone
>
>
>
>
>
> On Tue, Oct 27, 2015 at 9:56 AM, Aaron Halfaker <[email protected]>
> wrote:
>
>
>
> If we want to critique how we communicate about something, we can't do it
> in such general terms as "use 5+ edits".  We need to know what meaning is
> intended to be expressed.  Only within the context of "meaning" can we talk
> about "deception" and "misunderstanding".  As an empiricist, I'd like to
> challenge the speculation about the low competencies of our audience.
>
>
>
> For the purpose of the 5M report, our audience is a very large audience
> coming from very different walks of life, the report will be translated in
> many languages and will be read worldwide. We are not challenging the
> competency of our audience, instead we are trying to find a way to assist
> more of our audience to hear a story closer to the real story.
>
>
>
> So, if we're going to communicate how people contribute to Wikipedia and
> not "mislead", we're going to need to give people a primer on powerlaws of
> participation and discuss the implications of the best fit pareto index
> <https://en.wikipedia.org/wiki/Pareto_index> for Wikipedia edits.
>
>
>
> That's one option, but that's too hard to the extent that is impossible.
> The suggestion is that we do better with the understanding that many people
> will still not get the full picture, but many more will know a story that
> is closer to the reality of Wikipedia.
>
> Leila
>
>
>
>
>
>
>
> -Aaron
>
>
>
>
>
> On Tue, Oct 27, 2015 at 11:03 AM, Erik Zachte <[email protected]>
> wrote:
>
> I do agree that we reject good contributions. I also agree this is a messy
> filter.
>
>
>
> The main point however is do we want to communicate to the general public
> using such messy, fuzzy, inflated (partially), hard to not misunderstand
> numbers?
>
> We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias).
> Not untrue in some very formal sense, but totally misleading in that they
> play on expectations which are totally false.
>
>
>
> Erik
>
>
>
> *From:* Analytics [mailto:[email protected]] *On
> Behalf Of *Aaron Halfaker
> *Sent:* Tuesday, October 27, 2015 16:41
>
>
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking
> ENWP 5m article milestone
>
>
>
> I don't agree.  There are a lot of good-faith page creations that get
> deleted every day.  There are also many edits that get reverted.  Arguably,
> those edits aren't productive either, but they don't disappear from the
> dumps like article drafts do.  This is a messy filter at best.
>
>
>
> On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte <[email protected]>
> wrote:
>
> As Aaron says. I'd like to add that if almost 3 million accounts
> disappeared from the dumps alltogether (vandals? school kids?) that makes
> the case for not using such a count even more convincing.
>
>
>
> Erik
>
>
>
> *From:* Analytics [mailto:[email protected]] *On
> Behalf Of *Aaron Halfaker
> *Sent:* Tuesday, October 27, 2015 15:48
> *To:* A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking
> ENWP 5m article milestone
>
>
>
> user_editcount includes edits to deleted pages and revdeleted edits.
> Erik's perl scripts use the XML dumps that do not include edits to deleted
> pages.
>
> Strictly speaking, user_editcount is a better proxy for the number of
> people who have "ever edited".  Erik's is the number of people whose edits
> appear in the history of a page at the time of an XML dump.
>
> -Aaron
>
>
>
> On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan <[email protected]>
> wrote:
>
> I also wonder about this discrepancy. I ran a more explicit version of
> Andrew query, trying to eliminate some possible edge cases, and came up
> with the same number.
>
>
>
> Now I'm curious. Are there junk rows in our user table, retained for
> legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe
> the processing you perform to winnow down from 8.2 million?
>
>
>
> J
>
>
>
> On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray <[email protected]>
> wrote:
>
> Interesting - wonder why my query's giving a higher number?
>
> I agree entirely that we should be very careful with quoting these
> figures. I think you'd probably be safe to say that more than a
> million people have edited... but even then I'd be cautious.
>
> Andrew.
>
>
> On 27 October 2015 at 11:11, Erik Zachte <[email protected]> wrote:
> > Wikistats has it that 5,644,681 registered accounts published at least
> once till Oct 1, 2015, and 2,181,006 three or more times.
> > It used to publish that on [1][2] but I just removed it.
> >
> > I'm campaigning against us publishing overly inflated counts since about
> two years (Wikimania London).
> >
> > Since this thread is going on and on, I'll repost my (reworded)
> reservations on this particular metric, for newcomers.
> >
> > Even if we state explicitly that this is not unique people, any audience
> will think it may be close and we are overly correct by adding the caveat.
> It may not be so close. For that reason imo such a metric would be of
> questionable value, to put it mildly.
> >
> > Pine:
> >> Is there a way to get counts for the number of accounts, including or
> excluding IPs, that have ever edited English Wikipedia, ?
> >
> > First the anon contributors: when we'd count every ip address that shows
> up in the dumps, we'd count *very* many people who were just vandalizing
> willfully, or just pressing edit for fun, or forgot to login once, and also
> moved from one ip address to another over the years. On top of that many
> people get a new ip address (from a pool) on every session, depends on
> provider policy.
> >
> > As for registered editors the number Wikistats used to publish may be a
> rather empty metric for several reasons:
> > - How many casual editors will have forgotten their password and just
> created a new user id? Only veteran editors know about sockpuppeting and
> how one is supposed not to do that.
> > - How many people will have registered in good faith just out of habit,
> or to tweak presentation preferences, and then played with the edit button
> just to see what happens? Note that roughly 2 out of 3 accounts doesn't
> even reach 3 edits.
> >
> > Cheers,
> > Erik Zachte
> >
> > [1]
> https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution
> > [2] BTW I use the term wikipedians overly inclusive in that report. A
> person who edited once or twice isn't a wikipedian in my book, just like a
> person who writes two post-it notes per month and nothing else isn't called
> a writer. Some terms only apply above some threshold.
> >
> > -----Original Message-----
> > From: Analytics [mailto:[email protected]] On
> Behalf Of Andrew Gray
> > Sent: Tuesday, October 27, 2015 11:06
> > To: A mailing list for the Analytics Team at WMF and everybody who has
> an interest in Wikipedia and analytics.
> > Subject: Re: [Analytics] User statistics for video marking ENWP 5m
> article milestone
> >
> > To a very crude approximation, there are approximately 8.2 million
> accounts which have at least one edit on English Wikipedia - at least
> assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
> >
> > This is all user accounts with one or more edits in the contributions
> record; it does not contain IPs, and it does not contain any accounts whose
> sole contributions have since been deleted (which is probably quite a
> substantial number). Conversely, it includes a vast panoply of single-use
> vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
> >
> > Andrew.
> >
> > On 27 October 2015 at 05:50, Pine W <[email protected]> wrote:
> >> Is there a way to get counts for the number of accounts, including or
> >> excluding IPs, that have ever edited English Wikipedia, ? It would be
> >> preferable to know the number of unique people, but of course that's
> >> impossible.
> >>
> >> Thanks,
> >> Pine
> >>
> >> Aha, that is important for me to know. Thanks Andrew.
> >>
> >> Pine
> >>
> >>
> >> On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray
> >> <[email protected]>
> >> wrote:
> >>>
> >>> On 11 September 2015 at 19:19, James Forrester
> >>> <[email protected]>
> >>> wrote:
> >>>
> >>> >> Does it include editors on all Wikimedia projects
> >>> >
> >>> > No.
> >>> >
> >>> >> or just those who have registered and/or edited on ENWP?
> >>> >
> >>> > Registered, regardless of having edited.
> >>>
> >>> James is of course correct, but one small caveat worth adding:
> >>> because of SUL, a substantial proportion of these will be "autocreated"
> >>> accounts from other projects - so even 'registration' may not mean
> >>> what it seems.
> >>>
> >>> --
> >>> - Andrew Gray
> >>>   [email protected]
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> [email protected]
> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >>
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> [email protected]
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >
> >
> >
> > --
> > - Andrew Gray
> >   [email protected]
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> --
> - Andrew Gray
>   [email protected]
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
>
> --
>
> Jonathan T. Morgan
>
> Senior Design Researcher
>
> Wikimedia Foundation
>
> User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
>
>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to