Yes we were talking about editor counts, then we moved on to countries, that's 
what one calls an analogy ;-)

 

Erik

 

From: Analytics [mailto:[email protected]] On Behalf Of 
Aaron Halfaker
Sent: Tuesday, October 27, 2015 18:19
To: A mailing list for the Analytics Team at WMF and everybody who has an 
interest in Wikipedia and analytics.
Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m 
article milestone

 

> 800+ wikis and 280+ Wikipedias

I thought we were talking about editor counts here.

 

On Tue, Oct 27, 2015 at 12:18 PM, Erik Zachte <[email protected]> wrote:

Aaron, the example of countries doesn't seem fitting for me. In many bodies 
like UN all countries have one vote, and small countries are disproportionaly 
powerful. That's part of why 196 has meaning.

 

Let me put it this way, if you think our 800+ wikis and 280+ Wikipedias story 
is not misleading, then we're pretty far apart on what constitutes meaningful 
communication.

 

Erik 

 

 

From: Analytics [mailto:[email protected]] On Behalf Of 
Leila Zia
Sent: Tuesday, October 27, 2015 18:11


To: A mailing list for the Analytics Team at WMF and everybody who has an 
interest in Wikipedia and analytics.
Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m 
article milestone

 

 

On Tue, Oct 27, 2015 at 9:56 AM, Aaron Halfaker <[email protected]> wrote:

 

If we want to critique how we communicate about something, we can't do it in 
such general terms as "use 5+ edits".  We need to know what meaning is intended 
to be expressed.  Only within the context of "meaning" can we talk about 
"deception" and "misunderstanding".  As an empiricist, I'd like to challenge 
the speculation about the low competencies of our audience.

 

For the purpose of the 5M report, our audience is a very large audience coming 
from very different walks of life, the report will be translated in many 
languages and will be read worldwide. We are not challenging the competency of 
our audience, instead we are trying to find a way to assist more of our 
audience to hear a story closer to the real story.

 

So, if we're going to communicate how people contribute to Wikipedia and not 
"mislead", we're going to need to give people a primer on powerlaws of 
participation and discuss the implications of the best fit pareto index 
<https://en.wikipedia.org/wiki/Pareto_index>  for Wikipedia edits.

 

That's one option, but that's too hard to the extent that is impossible. The 
suggestion is that we do better with the understanding that many people will 
still not get the full picture, but many more will know a story that is closer 
to the reality of Wikipedia.

Leila

 

 

 

-Aaron 

 

 

On Tue, Oct 27, 2015 at 11:03 AM, Erik Zachte <[email protected]> wrote:

I do agree that we reject good contributions. I also agree this is a messy 
filter.

 

The main point however is do we want to communicate to the general public using 
such messy, fuzzy, inflated (partially), hard to not misunderstand numbers? 

We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias). Not 
untrue in some very formal sense, but totally misleading in that they play on 
expectations which are totally false.

 

Erik

 

From: Analytics [mailto:[email protected]] On Behalf Of 
Aaron Halfaker
Sent: Tuesday, October 27, 2015 16:41


To: A mailing list for the Analytics Team at WMF and everybody who has an 
interest in Wikipedia and analytics.
Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m 
article milestone

 

I don't agree.  There are a lot of good-faith page creations that get deleted 
every day.  There are also many edits that get reverted.  Arguably, those edits 
aren't productive either, but they don't disappear from the dumps like article 
drafts do.  This is a messy filter at best. 

 

On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte <[email protected]> wrote:

As Aaron says. I'd like to add that if almost 3 million accounts disappeared 
from the dumps alltogether (vandals? school kids?) that makes the case for not 
using such a count even more convincing. 

 

Erik

 

From: Analytics [mailto:[email protected]] On Behalf Of 
Aaron Halfaker
Sent: Tuesday, October 27, 2015 15:48
To: A mailing list for the Analytics Team at WMF and everybody who has an 
interest in Wikipedia and analytics.
Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m 
article milestone

 

user_editcount includes edits to deleted pages and revdeleted edits.  Erik's 
perl scripts use the XML dumps that do not include edits to deleted pages.  

Strictly speaking, user_editcount is a better proxy for the number of people 
who have "ever edited".  Erik's is the number of people whose edits appear in 
the history of a page at the time of an XML dump.  

-Aaron

 

On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan <[email protected]> wrote:

I also wonder about this discrepancy. I ran a more explicit version of Andrew 
query, trying to eliminate some possible edge cases, and came up with the same 
number. 

 

Now I'm curious. Are there junk rows in our user table, retained for legacy 
reasons maybe? Is user_editcount inaccurate? Erik, can you describe the 
processing you perform to winnow down from 8.2 million? 

 

J

 

On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray <[email protected]> wrote:

Interesting - wonder why my query's giving a higher number?

I agree entirely that we should be very careful with quoting these
figures. I think you'd probably be safe to say that more than a
million people have edited... but even then I'd be cautious.

Andrew.


On 27 October 2015 at 11:11, Erik Zachte <[email protected]> wrote:
> Wikistats has it that 5,644,681 registered accounts published at least once 
> till Oct 1, 2015, and 2,181,006 three or more times.
> It used to publish that on [1][2] but I just removed it.
>
> I'm campaigning against us publishing overly inflated counts since about two 
> years (Wikimania London).
>
> Since this thread is going on and on, I'll repost my (reworded) reservations 
> on this particular metric, for newcomers.
>
> Even if we state explicitly that this is not unique people, any audience will 
> think it may be close and we are overly correct by adding the caveat. It may 
> not be so close. For that reason imo such a metric would be of questionable 
> value, to put it mildly.
>
> Pine:
>> Is there a way to get counts for the number of accounts, including or 
>> excluding IPs, that have ever edited English Wikipedia, ?
>
> First the anon contributors: when we'd count every ip address that shows up 
> in the dumps, we'd count *very* many people who were just vandalizing 
> willfully, or just pressing edit for fun, or forgot to login once, and also 
> moved from one ip address to another over the years. On top of that many 
> people get a new ip address (from a pool) on every session, depends on 
> provider policy.
>
> As for registered editors the number Wikistats used to publish may be a 
> rather empty metric for several reasons:
> - How many casual editors will have forgotten their password and just created 
> a new user id? Only veteran editors know about sockpuppeting and how one is 
> supposed not to do that.
> - How many people will have registered in good faith just out of habit, or to 
> tweak presentation preferences, and then played with the edit button just to 
> see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 
> edits.
>
> Cheers,
> Erik Zachte
>
> [1] https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution
> [2] BTW I use the term wikipedians overly inclusive in that report. A person 
> who edited once or twice isn't a wikipedian in my book, just like a person 
> who writes two post-it notes per month and nothing else isn't called a 
> writer. Some terms only apply above some threshold.
>
> -----Original Message-----
> From: Analytics [mailto:[email protected]] On Behalf Of 
> Andrew Gray
> Sent: Tuesday, October 27, 2015 11:06
> To: A mailing list for the Analytics Team at WMF and everybody who has an 
> interest in Wikipedia and analytics.
> Subject: Re: [Analytics] User statistics for video marking ENWP 5m article 
> milestone
>
> To a very crude approximation, there are approximately 8.2 million accounts 
> which have at least one edit on English Wikipedia - at least assuming my SQL 
> query is correct! http://quarry.wmflabs.org/query/1911
>
> This is all user accounts with one or more edits in the contributions record; 
> it does not contain IPs, and it does not contain any accounts whose sole 
> contributions have since been deleted (which is probably quite a substantial 
> number). Conversely, it includes a vast panoply of single-use vandalism 
> accounts, sockpuppets, etc etc etc. And bots, of course.
>
> Andrew.
>
> On 27 October 2015 at 05:50, Pine W <[email protected]> wrote:
>> Is there a way to get counts for the number of accounts, including or
>> excluding IPs, that have ever edited English Wikipedia, ? It would be
>> preferable to know the number of unique people, but of course that's
>> impossible.
>>
>> Thanks,
>> Pine
>>
>> Aha, that is important for me to know. Thanks Andrew.
>>
>> Pine
>>
>>
>> On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray
>> <[email protected]>
>> wrote:
>>>
>>> On 11 September 2015 at 19:19, James Forrester
>>> <[email protected]>
>>> wrote:
>>>
>>> >> Does it include editors on all Wikimedia projects
>>> >
>>> > No.
>>> >
>>> >> or just those who have registered and/or edited on ENWP?
>>> >
>>> > Registered, regardless of having edited.
>>>
>>> James is of course correct, but one small caveat worth adding:
>>> because of SUL, a substantial proportion of these will be "autocreated"
>>> accounts from other projects - so even 'registration' may not mean
>>> what it seems.
>>>
>>> --
>>> - Andrew Gray
>>>   [email protected]
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
>
> --
> - Andrew Gray
>   [email protected]
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics



--
- Andrew Gray
  [email protected]

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics





 

-- 

Jonathan T. Morgan

Senior Design Researcher

Wikimedia Foundation

User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> 

 


_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

 


_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

 


_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

 


_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

 


_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

 

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to