Marcel, thanx for ur quick answer. My main issue with dumps (or i don't get something) is:
I need to download them first to be able to aggregate and filter. Which for the year 2016 would be: 40MB(middle) * 24h * 30d * 12m = about 350TB As i am not sitting directly at DE-CIX but in my private office i will face a pretty hard time with that :-) So my idea is that somebody "closer" to the raw data would basically do the aggregation and filtering for me... Will somebody (please) ? Thanx, JJ Am 06.03.2017 um 11:14 schrieb Marcel Ruiz Forns: > Hi Jörg, :] > > Do you mean top 250K most viewed *articles* in de.wikipedia.org > <http://de.wikipedia.org>? > > If so, I think you can get that from the dumps indeed. You can find 2016 > hourly pageview stats by article for all wikis > here: https://dumps.wikimedia.org/other/pageviews/2016/ > > Note that the wiki codes (first column) you're interested in are: /de/, > /de.m/ and /de.zero/. > The third column holds the number of pageviews you're after. > Also, this data set does not include bot traffic as recognized by the > pageview definition <https://meta.wikimedia.org/wiki/Research:Page_view>. > As files are hourly and contain data for all wikis, you'll need some > aggregation and filtering. > > Cheers! > > On Mon, Mar 6, 2017 at 2:59 AM, Jörg Jung <[email protected] > <mailto:[email protected]>> wrote: > > Ladies, gents, > > for a project i plan i'd need the following data: > > Top 250K sites for 2016 in project de.wikipedia.org > <http://de.wikipedia.org>, user-access. > > I only need the name of the site and the corrsponding number of > user-accesses (all channels) for 2016 (sum over the year). > > As far as i can see i can't get that data via REST or by aggegating > dumps. > > So i'd like to ask here, if someone likes to helpout. > > Thanx, cheers, JJ > > -- > Jörg Jung, Dipl. Inf. (FH) > Hasendriesch 2 > D-53639 Königswinter > E-Mail: [email protected] <mailto:[email protected]> > Web: www.retevastum.de <http://www.retevastum.de> > www.datengraphie.de <http://www.datengraphie.de> > www.digitaletat.de <http://www.digitaletat.de> > www.olfaktum.de <http://www.olfaktum.de> > > _______________________________________________ > Analytics mailing list > [email protected] <mailto:[email protected]> > https://lists.wikimedia.org/mailman/listinfo/analytics > <https://lists.wikimedia.org/mailman/listinfo/analytics> > > > > > -- > *Marcel Ruiz Forns* > Analytics Developer > Wikimedia Foundation > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Jörg Jung, Dipl. Inf. (FH) Hasendriesch 2 D-53639 Königswinter E-Mail: [email protected] Web: www.retevastum.de www.datengraphie.de www.digitaletat.de www.olfaktum.de _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
