On Wed, Aug 1, 2018 at 3:07 PM Yuan Gao <gaoy...@google.com> wrote:

> Hi Tilman,
> our team, i.e., the team working on extracting the knowledge from
> Wikipedia in Google, has just compared our crawled data with
> https://meta.wikimedia.org/wiki/List_of_Wikipedias/Table. In the
> following sites, we have quite significant diffs:
>

The stats Special Page for bo.wikipedia provide the following count as of
today:

Content pages
<https://bo.wikipedia.org/w/index.php?title=Special:AllPages&hideredirects=1>
5,818
Pages <https://bo.wikipedia.org/wiki/Special:AllPages> (All pages in the
wiki, including talk pages, redirects, etc.)16,498

A page, according to software documentation is: "The automatic definition
used by the software at Special:Statistics
<https://en.wikipedia.org/wiki/Special:Statistics> is: *any page that is in
the article namespace, is not a redirect page
<https://en.wikipedia.org/wiki/Wikipedia:Redirect> and contains at least
one wiki link*." Could it be possible that your definition is broader than
the Mediawiki one?
https://en.wikipedia.org/wiki/Wikipedia:What_is_an_article%3F#Lists_of_articles_and_statistics
Other things I would suggest is if Google may be including duplicate
results.

There could be some amount of caching in both the statistics calculation
and the rendering of those pages, although probably not enough to double
the number of articles.

-- 
Jaime Crespo
<http://wikimedia.org>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to