Jorg, the project abbreviations are explained in depth here:
https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageviews

On Mon, Mar 6, 2017 at 11:15 AM, Jörg Jung <[email protected]> wrote:

> Yeah, Dan, that will work, thanx.
>
> Just out of curiosity: Why are there three projects for "de" and what is
> the difference between them ?  /de/,/de.m/ and /de.zero/
>
> Cheers, JJ
>
> Am 06.03.2017 um 15:45 schrieb Dan Andreescu:
> > Jorg, take a look at https://dumps.wikimedia.org/other/pagecounts-ez/
> > which has compressed data without losing granularity.  You can get
> > monthly files here and download a lot less data.
> >
> > On Mon, Mar 6, 2017 at 5:40 AM, Jörg Jung <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     Marcel,
> >
> >     thanx for ur quick answer.
> >     My main issue with dumps (or i don't get something) is:
> >
> >     I need to download them first to be able to aggregate and filter.
> >     Which for the year 2016 would be: 40MB(middle) * 24h * 30d * 12m =
> about
> >     350TB
> >
> >     As i am not sitting directly at DE-CIX but in my private office i
> will
> >     face a pretty hard time with that :-)
> >
> >     So my idea is that somebody "closer" to the raw data would basically
> do
> >     the aggregation and filtering for me...
> >
> >     Will somebody (please) ?
> >
> >     Thanx, JJ
> >
> >     Am 06.03.2017 um 11:14 schrieb Marcel Ruiz Forns:
> >     > Hi Jörg, :]
> >     >
> >     > Do you mean top 250K most viewed *articles* in de.wikipedia.org
> >     <http://de.wikipedia.org>
> >     > <http://de.wikipedia.org>?
> >     >
> >     > If so, I think you can get that from the dumps indeed. You can
> find 2016
> >     > hourly pageview stats by article for all wikis
> >     > here: https://dumps.wikimedia.org/other/pageviews/2016/
> >     <https://dumps.wikimedia.org/other/pageviews/2016/>
> >     >
> >     > Note that the wiki codes (first column) you're interested in are:
> >     /de/,
> >     > /de.m/ and /de.zero/.
> >     > The third column holds the number of pageviews you're after.
> >     > Also, this data set does not include bot traffic as recognized by
> the
> >     > pageview definition
> >     <https://meta.wikimedia.org/wiki/Research:Page_view
> >     <https://meta.wikimedia.org/wiki/Research:Page_view>>.
> >     > As files are hourly and contain data for all wikis, you'll need
> some
> >     > aggregation and filtering.
> >     >
> >     > Cheers!
> >     >
> >     > On Mon, Mar 6, 2017 at 2:59 AM, Jörg Jung <
> [email protected] <mailto:[email protected]>
> >     > <mailto:[email protected] <mailto:[email protected]>>>
> wrote:
> >     >
> >     >     Ladies, gents,
> >     >
> >     >     for a project i plan i'd need the following data:
> >     >
> >     >     Top 250K sites for 2016 in project de.wikipedia.org <
> http://de.wikipedia.org>
> >     >     <http://de.wikipedia.org>, user-access.
> >     >
> >     >     I only need the name of the site and the corrsponding number of
> >     >     user-accesses (all channels) for 2016 (sum over the year).
> >     >
> >     >     As far as i can see i can't get that data via REST or by
> aggegating
> >     >     dumps.
> >     >
> >     >     So i'd like to ask here, if someone likes to helpout.
> >     >
> >     >     Thanx, cheers, JJ
> >     >
> >     >     --
> >     >     Jörg Jung, Dipl. Inf. (FH)
> >     >     Hasendriesch 2
> >     >     D-53639 Königswinter
> >     >     E-Mail:     [email protected]
> >     <mailto:[email protected]> <mailto:[email protected]
> >     <mailto:[email protected]>>
> >     >     Web:        www.retevastum.de <http://www.retevastum.de>
> >     <http://www.retevastum.de>
> >     >                 www.datengraphie.de <http://www.datengraphie.de>
> >     <http://www.datengraphie.de>
> >     >                 www.digitaletat.de <http://www.digitaletat.de>
> >     <http://www.digitaletat.de>
> >     >                 www.olfaktum.de <http://www.olfaktum.de>
> >     <http://www.olfaktum.de>
> >     >
> >     >     _______________________________________________
> >     >     Analytics mailing list
> >     >     [email protected]
> >     <mailto:[email protected]>
> >     <mailto:[email protected]
> >     <mailto:[email protected]>>
> >     >     https://lists.wikimedia.org/mailman/listinfo/analytics
> >     <https://lists.wikimedia.org/mailman/listinfo/analytics>
> >     >     <https://lists.wikimedia.org/mailman/listinfo/analytics
> >     <https://lists.wikimedia.org/mailman/listinfo/analytics>>
> >     >
> >     >
> >     >
> >     >
> >     > --
> >     > *Marcel Ruiz Forns*
> >     > Analytics Developer
> >     > Wikimedia Foundation
> >     >
> >     >
> >     > _______________________________________________
> >     > Analytics mailing list
> >     > [email protected] <mailto:Analytics@lists.
> wikimedia.org>
> >     > https://lists.wikimedia.org/mailman/listinfo/analytics
> >     <https://lists.wikimedia.org/mailman/listinfo/analytics>
> >     >
> >
> >     --
> >     Jörg Jung, Dipl. Inf. (FH)
> >     Hasendriesch 2
> >     D-53639 Königswinter
> >     E-Mail:     [email protected] <mailto:joerg.jung@retevastum.
> de>
> >     Web:        www.retevastum.de <http://www.retevastum.de>
> >                 www.datengraphie.de <http://www.datengraphie.de>
> >                 www.digitaletat.de <http://www.digitaletat.de>
> >                 www.olfaktum.de <http://www.olfaktum.de>
> >
> >     _______________________________________________
> >     Analytics mailing list
> >     [email protected] <mailto:[email protected]>
> >     https://lists.wikimedia.org/mailman/listinfo/analytics
> >     <https://lists.wikimedia.org/mailman/listinfo/analytics>
> >
> >
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
> --
> Jörg Jung, Dipl. Inf. (FH)
> Hasendriesch 2
> D-53639 Königswinter
> E-Mail:     [email protected]
> Web:        www.retevastum.de
>             www.datengraphie.de
>             www.digitaletat.de
>             www.olfaktum.de
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to