>The good source for Recent pageview data is hadoop, going back a bit the well-loved webstatscollector files provide that info:
Sorry, mean to sent two links: http://dumps.wikimedia.org/other/pagecounts-all-sites/ -> this is data from hadoop http://dumps.wikimedia.org/other/pagecounts-raw/ -> this is data from webstatscollector On Thu, Jan 8, 2015 at 7:34 AM, Nuria Ruiz <[email protected]> wrote: > >It uses 1:1000 random sampling, so I have to count the log events and > multiply by 1000 to get a good estimation. Am I missing something? > Quite a bit actually. Mostly that reporting is only available to "some" > browsers (the majority but not all) but also only the main document is > counted and a pageview is more than the request of the main document. For > example, you will not get all 301s/302s or images and there are many, many > other details. > > See pageview definition: > https://meta.wikimedia.org/wiki/Research:Page_view > > The good source for Recent pageview data is hadoop, going back a bit the > well-loved webstatscollector files provide that info: > http://dumps.wikimedia.org/other/pagecounts-all-sites/ > > > On Wed, Jan 7, 2015 at 11:12 PM, Gergo Tisza <[email protected]> wrote: > >> On Wed, Jan 7, 2015 at 5:59 PM, Nuria Ruiz <[email protected]> wrote: >> >>> >Back when MediaViewer was launched, I added a namespace parameter to >>> NavigationTiming to be able to track per-namespace pageviews, >>> Navigation timing is heavily sampled so I am not sure you could estimate >>> pageviews with the scarce dataset it provides, I would say it is not >>> possible. >>> >> >> It uses 1:1000 random sampling, so I have to count the log events and >> multiply by 1000 to get a good estimation. Am I missing something? >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
