Zainan:

Labs is our cloud environment for volunteers, you can direct questions
about that to cloud e-mail list.

https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_Introduction

Thanks,

Nuria

On Mon, Apr 2, 2018 at 7:44 PM, Zainan Zhou (a.k.a Victor) <[email protected]>
wrote:

> Thanks Dan, that's very helpful, I asked two follow-up questions inline
> below
>
>
> * •  **Zainan Zhou(**周载南**) a.k.a. "Victor" * <http://who/zzn>
> * •  *Software Engineer, Data Engine
> * •*  Google Inc.
> * •  *[email protected] <[email protected]> - 650.336.5691
> * • * 1600 Amphitheathre Pkwy, LDAP zzn, Mountain View 94043
>
> On Sat, Mar 31, 2018 at 12:34 AM, Dan Andreescu <[email protected]>
> wrote:
>
>> Thanks to Tilman for pointing out that this data is still being worked
>> on.  So, yes, there are lots of subtleties in how we count articles,
>> redirects, content vs. non-content, etc.  I don't have the answer to all of
>> the discrepancies that Tilman found, but if you need a very accurate
>> answer, the only way is to get an account on labs and start digging into
>> how exactly you want to count the articles.
>>
>
> What's the best way to signup the labs account? (does it require certain
> qualifications?)
> And could you point us to the code or entry of the code repository?
>
>
>
>> As our datasets and APIs get more mature, we're hoping to give as much
>> flexibility as everyone needs, but not so much as to drive people crazy.
>> Until then, we're slowly improving our docs.
>>
>> And yes, don't read some of this stuff alone at night, the buddy system
>> works well for data analysis, lol
>>
>> On Fri, Mar 30, 2018 at 6:43 AM, Zainan Zhou (a.k.a Victor) <
>> [email protected]> wrote:
>>
>>> Thank you very much Dan, this turns out to be very helpful. My teammates
>>> has started looking into it.
>>>
>>>
>>> * •  **Zainan Zhou(**周载南**) a.k.a. "Victor" * <http://who/zzn>
>>> * •  *Software Engineer, Data Engine
>>> * •*  Google Inc.
>>> * •  *[email protected] <[email protected]> - 650.336.5691
>>> * • * 1600 Amphitheathre Pkwy, LDAP zzn, Mountain View 94043
>>>
>>> On Fri, Mar 30, 2018 at 5:12 AM, Dan Andreescu <[email protected]
>>> > wrote:
>>>
>>>> Forwarding this question to the public Analytics list, where it's good
>>>> to have these kinds of discussions.  If you're interested in this data and
>>>> how it changes over time, do subscribe and watch for updates, notices of
>>>> outages, etc.
>>>>
>>>> Ok, so on to your question.  You'd like the *total # of articles for
>>>> each wiki*.  I think the simplest way right now is to query the AQS
>>>> (Analytics Query Service) API, documented here:
>>>> https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
>>>>
>>>> To get the # of articles for a wiki, let's say en.wikipedia.org, you
>>>> can get the timeseries of new articles per month since the beginning of
>>>> time:
>>>>
>>>> *https://wikimedia.org/api/rest_v1/metrics/edited-pages/new/en.wikipedia.org/all-editor-types/all-page-types/monthly/2001010100/2018032900
>>>> <https://wikimedia.org/api/rest_v1/metrics/edited-pages/new/en.wikipedia.org/all-editor-types/all-page-types/monthly/2001010100/2018032900>*
>>>>
>>>> And to get a list of all wikis, to plug into that URL instead of "
>>>> en.wikipedia.org", the most up-to-date information is here:
>>>> https://meta.wikimedia.org/wiki/Special:SiteMatrix in table form or
>>>> via the mediawiki API: https://meta.wikimedia.or
>>>> g/w/api.php?action=sitematrix&formatversion=2&format=json&ma
>>>> xage=3600&smaxage=3600.  Sometimes new sites won't have data in the
>>>> AQS API for a month or two until we add them and start crunching their
>>>> stats.
>>>>
>>>> The way I figured this out is to look at how our UI uses the API:
>>>> https://stats.wikimedia.org/v2/#/en.wikipedia.org/contr
>>>> ibuting/new-pages.  So if you were interested in something else, you
>>>> can browse around there and take a look at the XHR requests in the browser
>>>> console.  Have fun!
>>>>
>>>> On Thu, Mar 29, 2018 at 12:54 AM, Zainan Zhou (a.k.a Victor) <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Dan,
>>>>>
>>>>> How are you! This is Victor, It's been a while since we meet at the
>>>>> 2018 Wikimedia Dev Summit. I hope you are doing great.
>>>>>
>>>>> As I mentioned to you, my team works on extracting the knowledge from
>>>>> Wikipedia. Currently it's undergoing a project that expands language
>>>>> coverage. My teammate Yuan Gao(cc'ed here)  is tech leader of this
>>>>> project.She plans to *monitor the list of all the current available
>>>>> wikipedia's sites and the number of articles for each language*, so
>>>>> that we can compare with our extraction system's output to sanity-check if
>>>>> there is a massive breakage of the extraction logic, or if we need to
>>>>> add/remove languages in the event that a new wikipedia site is introduced
>>>>> to/remove from the wikipedia family.
>>>>>
>>>>> I think your team at Analytics at Wikimedia probably knows the best
>>>>> where we can find this data. Here are 4 places we already know, but 
>>>>> doesn't
>>>>> seem to have the data.
>>>>>
>>>>>
>>>>>    - https://en.wikipedia.org/wiki/List_of_Wikipedias. has the
>>>>>    information we need, but the list is manually edited, not automatic
>>>>>    - https://stats.wikimedia.org/EN/Sitemap.htm, has the full list,
>>>>>    but the information seems pretty out of date(last updated almost a 
>>>>> month
>>>>>    ago)
>>>>>    - StatsV2 UI: https://stats.wikimedia.org/v2/#/all-projects, I
>>>>>    can't find the full list nor the number of articles
>>>>>    - API https://wikimedia.org/api/rest_v1/ suggested by elukey on
>>>>>    #wikimedia-analytics channel, it doesn't seem to have # of article
>>>>>    information
>>>>>
>>>>> Do you know what is a good place to find this information? Thank you!
>>>>>
>>>>> Victor
>>>>>
>>>>>
>>>>>
>>>>> * •  **Zainan Zhou(**周载南**) a.k.a. "Victor" * <http://who/zzn>
>>>>> * •  *Software Engineer, Data Engine
>>>>> * •*  Google Inc.
>>>>> * •  *[email protected] <[email protected]> - 650.336.5691
>>>>> * • * 1600 Amphitheathre Pkwy, LDAP zzn, Mountain View 94043
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Yuan Gao <[email protected]>
>>>>> Date: Wed, Mar 28, 2018 at 4:15 PM
>>>>> Subject: Monitor the number of Wikipedia sites and the number of
>>>>> articles in each site
>>>>> To: Zainan Victor Zhou <[email protected]>
>>>>> Cc: Wenjie Song <[email protected]>, WikiData <[email protected]>
>>>>>
>>>>>
>>>>> Hi Victor,
>>>>> as we discussed in the meeting, I'd like to monitor:
>>>>> 1) the number of Wikipedia sites
>>>>> 2) the number of articles in each site
>>>>>
>>>>> Can you help us to contact with WMF to get a realtime or at least
>>>>> daily update of these numbers? What we can find now is
>>>>> https://en.wikipedia.org/wiki/List_of_Wikipedias, but the number of
>>>>> Wikipedia sites is manually updated, and possibly out-of-date.
>>>>>
>>>>>
>>>>> The monitor can help us catch such bugs.
>>>>>
>>>>> --
>>>>> Yuan Gao
>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to