Jorg, take a look at https://dumps.wikimedia.org/other/pagecounts-ez/ which has compressed data without losing granularity. You can get monthly files here and download a lot less data.
On Mon, Mar 6, 2017 at 5:40 AM, Jörg Jung <[email protected]> wrote: > Marcel, > > thanx for ur quick answer. > My main issue with dumps (or i don't get something) is: > > I need to download them first to be able to aggregate and filter. > Which for the year 2016 would be: 40MB(middle) * 24h * 30d * 12m = about > 350TB > > As i am not sitting directly at DE-CIX but in my private office i will > face a pretty hard time with that :-) > > So my idea is that somebody "closer" to the raw data would basically do > the aggregation and filtering for me... > > Will somebody (please) ? > > Thanx, JJ > > Am 06.03.2017 um 11:14 schrieb Marcel Ruiz Forns: > > Hi Jörg, :] > > > > Do you mean top 250K most viewed *articles* in de.wikipedia.org > > <http://de.wikipedia.org>? > > > > If so, I think you can get that from the dumps indeed. You can find 2016 > > hourly pageview stats by article for all wikis > > here: https://dumps.wikimedia.org/other/pageviews/2016/ > > > > Note that the wiki codes (first column) you're interested in are: /de/, > > /de.m/ and /de.zero/. > > The third column holds the number of pageviews you're after. > > Also, this data set does not include bot traffic as recognized by the > > pageview definition <https://meta.wikimedia.org/wiki/Research:Page_view > >. > > As files are hourly and contain data for all wikis, you'll need some > > aggregation and filtering. > > > > Cheers! > > > > On Mon, Mar 6, 2017 at 2:59 AM, Jörg Jung <[email protected] > > <mailto:[email protected]>> wrote: > > > > Ladies, gents, > > > > for a project i plan i'd need the following data: > > > > Top 250K sites for 2016 in project de.wikipedia.org > > <http://de.wikipedia.org>, user-access. > > > > I only need the name of the site and the corrsponding number of > > user-accesses (all channels) for 2016 (sum over the year). > > > > As far as i can see i can't get that data via REST or by aggegating > > dumps. > > > > So i'd like to ask here, if someone likes to helpout. > > > > Thanx, cheers, JJ > > > > -- > > Jörg Jung, Dipl. Inf. (FH) > > Hasendriesch 2 > > D-53639 Königswinter > > E-Mail: [email protected] <mailto:joerg.jung@retevastum. > de> > > Web: www.retevastum.de <http://www.retevastum.de> > > www.datengraphie.de <http://www.datengraphie.de> > > www.digitaletat.de <http://www.digitaletat.de> > > www.olfaktum.de <http://www.olfaktum.de> > > > > _______________________________________________ > > Analytics mailing list > > [email protected] <mailto:[email protected]> > > https://lists.wikimedia.org/mailman/listinfo/analytics > > <https://lists.wikimedia.org/mailman/listinfo/analytics> > > > > > > > > > > -- > > *Marcel Ruiz Forns* > > Analytics Developer > > Wikimedia Foundation > > > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > -- > Jörg Jung, Dipl. Inf. (FH) > Hasendriesch 2 > D-53639 Königswinter > E-Mail: [email protected] > Web: www.retevastum.de > www.datengraphie.de > www.digitaletat.de > www.olfaktum.de > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
