Cool. In that case, I will generate a dump for all the data we have,
report back when done, and if there are no issues with releasing it,
tarball it up and put it on figshare :)

On 15 April 2015 at 13:27, Dario Taraborelli <[email protected]> wrote:
> thanks, both. Let's go ahead with English only and no spiders filtered or
> mobile/desktop breakdown, per Oliver.
>
> Michelle – given the aggregation level I am fine moving forward with this
> release, but let me know off-thread if you have any questions.
>
> Dario
>
> On Wed, Apr 15, 2015 at 9:53 AM, Oliver Keyes <[email protected]> wrote:
>>
>> Dario,
>>
>> No spider filtering, and no split between mobile and desktop; mobile
>> and desktop are grouped.
>>
>> On 15 April 2015 at 12:46, Hirav Gandhi <[email protected]> wrote:
>> > e.g. German*
>> >
>> > I need more coffee.
>> >
>> >
>> >
>> > On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi <[email protected]>
>> > wrote:
>> >>
>> >> Dario - we just want a representative samples of traffic for a popular
>> >> site like Wikipedia. We thought limiting to the English Wikipedia would
>> >> be
>> >> easier.
>> >>
>> >> If we get aggregated data across all language Wikipedia sites, we would
>> >> need someway to tease out which language is being queried when. Some
>> >> languages (for e.g. German) we would hypothesize would have more daily
>> >> seasonality than languages like English.
>> >>
>> >>
>> >>
>> >> On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli
>> >> <[email protected]> wrote:
>> >>>
>> >>> Hirav, Bharath – I also want to hear from you if there's a specific
>> >>> reason to ask for English Wikipedia only or if a dataset encompassing
>> >>> aggregate pageviews across all Wikimedia properties would do the job.
>> >>>
>> >>> Dario
>> >>>
>> >>> On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli
>> >>> <[email protected]> wrote:
>> >>>>
>> >>>> Oliver -- thanks for running a preliminary check, I'm fine releasing
>> >>>> this data in aggregate under CC0, I believe it would be valuable for
>> >>>> this
>> >>>> and other research projects (copying Michelle from Legal).
>> >>>>
>> >>>> Before we do so, though, I want to confirm the specs: aggregate
>> >>>> pageviews per second to English Wikipedia, excluding bot traffic,
>> >>>> broken
>> >>>> down by access method (mobile web vs desktop site, not apps) for a
>> >>>> 60-day
>> >>>> period. Oliver – are these the filters you used to identify the data
>> >>>> point
>> >>>> with the smallest number of observations?
>> >>>>
>> >>>> Obviously, we will need to take into account this release when we
>> >>>> start
>> >>>> working on projects such as
>> >>>>
>> >>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits
>> >>>> and
>> >>>>
>> >>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews
>> >>>>
>> >>>> Dario
>> >>>>
>> >>>> On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes <[email protected]>
>> >>>> wrote:
>> >>>>>
>> >>>>> Bumping for Dario, per Pine's excellent example :)
>> >>>>>
>> >>>>> On 13 April 2015 at 22:18, Hirav Gandhi <[email protected]>
>> >>>>> wrote:
>> >>>>> > Oliver: Two months is fine. Thank you so much for your help!
>> >>>>> >
>> >>>>> >> On Apr 13, 2015, at 4:40 PM,
>> >>>>> >> [email protected]
>> >>>>> >> wrote:
>> >>>>> >>
>> >>>>> >> Send Analytics mailing list submissions to
>> >>>>> >>       [email protected]
>> >>>>> >>
>> >>>>> >> To subscribe or unsubscribe via the World Wide Web, visit
>> >>>>> >>       https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >> or, via email, send a message with subject or body 'help' to
>> >>>>> >>       [email protected]
>> >>>>> >>
>> >>>>> >> You can reach the person managing the list at
>> >>>>> >>       [email protected]
>> >>>>> >>
>> >>>>> >> When replying, please edit your Subject line so it is more
>> >>>>> >> specific
>> >>>>> >> than "Re: Contents of Analytics digest..."
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> Today's Topics:
>> >>>>> >>
>> >>>>> >>   1. Re: Page views on a more frequent than hourly basis (Pine W)
>> >>>>> >>   2. Re: Page views on a more frequent than hourly basis (Hirav
>> >>>>> >> Gandhi)
>> >>>>> >>   3. Re: Page views on a more frequent than hourly basis (Oliver
>> >>>>> >> Keyes)
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> ----------------------------------------------------------------------
>> >>>>> >>
>> >>>>> >> Message: 1
>> >>>>> >> Date: Mon, 13 Apr 2015 13:34:23 -0700
>> >>>>> >> From: Pine W <[email protected]>
>> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody
>> >>>>> >> who
>> >>>>> >>       has an  interest in Wikipedia and analytics."
>> >>>>> >>       <[email protected]>
>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than
>> >>>>> >> hourly
>> >>>>> >>       basis
>> >>>>> >> Message-ID:
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> <CAF=dyjjzmdfthz+0+lwnhb9m8xuod4wetgcfuxyb9qyf7cy...@mail.gmail.com>
>> >>>>> >> Content-Type: text/plain; charset="utf-8"
>> >>>>> >>
>> >>>>> >> Hi Oliver, re ccing people who are on list, this is the protocol
>> >>>>> >> we
>> >>>>> >> followed in IEGCom to ping people who are subscribed and
>> >>>>> >> mentioned
>> >>>>> >> in
>> >>>>> >> certain emails but, like many of us, may automatically move
>> >>>>> >> emails
>> >>>>> >> from
>> >>>>> >> lists directly to folders where they may be unread for days. So
>> >>>>> >> there is a
>> >>>>> >> reason to do this.
>> >>>>> >>
>> >>>>> >> Thanks,
>> >>>>> >>
>> >>>>> >> Pine
>> >>>>> >> -------------- next part --------------
>> >>>>> >> An HTML attachment was scrubbed...
>> >>>>> >> URL:
>> >>>>> >>
>> >>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html>
>> >>>>> >>
>> >>>>> >> ------------------------------
>> >>>>> >>
>> >>>>> >> Message: 2
>> >>>>> >> Date: Mon, 13 Apr 2015 16:30:43 -0700
>> >>>>> >> From: Hirav Gandhi <[email protected]>
>> >>>>> >> To: [email protected]
>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than
>> >>>>> >> hourly
>> >>>>> >>       basis
>> >>>>> >> Message-ID:
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> <CANzC_EOvi4MP7G_SsxvW=uojpt2vxbnfmhcipqn1pumace-...@mail.gmail.com>
>> >>>>> >> Content-Type: text/plain; charset="utf-8"
>> >>>>> >>
>> >>>>> >> Thanks Oliver!
>> >>>>> >>
>> >>>>> >> We would like this data for as broad of a time period as you can
>> >>>>> >> muster.
>> >>>>> >> The more days, months and year represented in the dataset, the
>> >>>>> >> better.
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>> Okay, so:
>> >>>>> >>>
>> >>>>> >>> I took an hour from the pageviews logs,[0] and aggregated
>> >>>>> >>> pageviews
>> >>>>> >>> to
>> >>>>> >>> enwiki (mobile and desktop both) by timestamp, down to
>> >>>>> >>> one-second
>> >>>>> >>> resolution levels. The lowest number of pageviews to enwiki per
>> >>>>> >>> second
>> >>>>> >>> was 2,981
>> >>>>> >>>
>> >>>>> >>> So, I don't personally have a problem with generating a release
>> >>>>> >>> of:
>> >>>>> >>>
>> >>>>> >>> 1. Pageviews per second;
>> >>>>> >>> 2. To enwiki;
>> >>>>> >>> 3. Over $TIME_PERIOD;
>> >>>>> >>> 4. grouping the mobile and desktop site
>> >>>>> >>>
>> >>>>> >>> But Dario or someone should chip in before I touch anything ;p
>> >>>>> >>>
>> >>>>> >>> 6am yesterday. 6am because it should be low-traffic, right? At
>> >>>>> >>> least
>> >>>>> >>> given our biases towards north america and europe
>> >>>>> >>>
>> >>>>> >>> On 13 April 2015 at 11:54, Oliver Keyes <[email protected]>
>> >>>>> >>> wrote:
>> >>>>> >>>> Then that sounds much more viable. I'll run a quick test now to
>> >>>>> >>>> see
>> >>>>> >>>> how much clustering we'd see at, say, the one-second resolution
>> >>>>> >>>> level,
>> >>>>> >>>> and throw it out here so we can make more informed decisions
>> >>>>> >>>> about
>> >>>>> >>>> a
>> >>>>> >>>> data release on this.
>> >>>>> >>>>
>> >>>>> >>>> On 13 April 2015 at 08:08, Hirav Gandhi
>> >>>>> >>>> <[email protected]>
>> >>>>> >>>> wrote:
>> >>>>> >>>>> Hi Oliver,
>> >>>>> >>>>>
>> >>>>> >>>>> Re: Hirav: would you be looking for temporally /and/
>> >>>>> >>>>> contextually
>> >>>>> >>> granular
>> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >>>>> >>>>> temporally
>> >>>>> >>> granular,
>> >>>>> >>>>> so "a view to a page on enwiki at X time"? If the latter
>> >>>>> >>>>> you've
>> >>>>> >>>>> got
>> >>>>> >>> more of
>> >>>>> >>>>> a shot, I suspect.
>> >>>>> >>>>>
>> >>>>> >>>>> I only want the latter - I am not concerned with the context
>> >>>>> >>>>> so
>> >>>>> >>>>> much as
>> >>>>> >>> just
>> >>>>> >>>>> “a view to a page on enwiki at X time.”
>> >>>>> >>>>>
>> >>>>> >>>>> Hirav
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> On Apr 13, 2015, at 5:00 AM,
>> >>>>> >>>>> [email protected]
>> >>>>> >>> wrote:
>> >>>>> >>>>>
>> >>>>> >>>>> Send Analytics mailing list submissions to
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>>
>> >>>>> >>>>> To subscribe or unsubscribe via the World Wide Web, visit
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>> or, via email, send a message with subject or body 'help' to
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>>
>> >>>>> >>>>> You can reach the person managing the list at
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>>
>> >>>>> >>>>> When replying, please edit your Subject line so it is more
>> >>>>> >>>>> specific
>> >>>>> >>>>> than "Re: Contents of Analytics digest..."
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> Today's Topics:
>> >>>>> >>>>>
>> >>>>> >>>>>  1. Re: Page views on a more frequent than hourly basis (Pine
>> >>>>> >>>>> W)
>> >>>>> >>>>>  2. Re: Page views on a more frequent than hourly basis
>> >>>>> >>>>> (Oliver
>> >>>>> >>>>> Keyes)
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> ----------------------------------------------------------------------
>> >>>>> >>>>>
>> >>>>> >>>>> Message: 1
>> >>>>> >>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>> >>>>> >>>>> From: Pine W <[email protected]>
>> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and
>> >>>>> >>>>> everybody
>> >>>>> >>>>> who
>> >>>>> >>>>> has an interest in Wikipedia and analytics."
>> >>>>> >>>>> <[email protected]>
>> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]>
>> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than
>> >>>>> >>>>> hourly
>> >>>>> >>>>> basis
>> >>>>> >>>>> Message-ID:
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com>
>> >>>>> >>>>> Content-Type: text/plain; charset="utf-8"
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> Hi,
>> >>>>> >>>>>
>> >>>>> >>>>> This issue of pageview data granularity has been discussed
>> >>>>> >>>>> before, and
>> >>>>> >>> the
>> >>>>> >>>>> answer has been that hourly is the smallest increment allowed
>> >>>>> >>>>> to
>> >>>>> >>>>> be
>> >>>>> >>>>> revealed publicly, for privacy reasons.
>> >>>>> >>>>>
>> >>>>> >>>>> I believe that the person you will want to discuss your
>> >>>>> >>>>> request
>> >>>>> >>>>> with is
>> >>>>> >>>>> Toby, who I have cc'd here.
>> >>>>> >>>>>
>> >>>>> >>>>> Pine
>> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >>>>> >>>>> <[email protected]>
>> >>>>> >>> wrote:
>> >>>>> >>>>>
>> >>>>> >>>>> Hi Wikimedia Analytics Team,
>> >>>>> >>>>>
>> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic
>> >>>>> >>>>> server
>> >>>>> >>> allocation
>> >>>>> >>>>> algorithms and we were looking for a suitable datasets to test
>> >>>>> >>>>> our
>> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an
>> >>>>> >>>>> amazing
>> >>>>> >>>>> data
>> >>>>> >>> set
>> >>>>> >>>>> of hourly page views, but we were looking for something a bit
>> >>>>> >>>>> more
>> >>>>> >>>>> granular, such as aggregated page requests to English
>> >>>>> >>>>> Wikipedia
>> >>>>> >>>>> on a
>> >>>>> >>> minute
>> >>>>> >>>>> by minute basis or second by second basis if possible.
>> >>>>> >>>>>
>> >>>>> >>>>> We are more than happy to pour through any raw data you might
>> >>>>> >>>>> have that
>> >>>>> >>>>> would help us calculate page requests at this granular level.
>> >>>>> >>>>> Please
>> >>>>> >>> let us
>> >>>>> >>>>> know if it would be possible to get such data and if so how.
>> >>>>> >>>>> Thank you
>> >>>>> >>> in
>> >>>>> >>>>> advance for your help.
>> >>>>> >>>>>
>> >>>>> >>>>> Best,
>> >>>>> >>>>>
>> >>>>> >>>>> Hirav Gandhi
>> >>>>> >>>>> _______________________________________________
>> >>>>> >>>>> Analytics mailing list
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>
>> >>>>> >>>>> -------------- next part --------------
>> >>>>> >>>>> An HTML attachment was scrubbed...
>> >>>>> >>>>> URL:
>> >>>>> >>>>> <
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html
>> >>>>> >>>>
>> >>>>> >>>>>
>> >>>>> >>>>> ------------------------------
>> >>>>> >>>>>
>> >>>>> >>>>> Message: 2
>> >>>>> >>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>> >>>>> >>>>> From: Oliver Keyes <[email protected]>
>> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and
>> >>>>> >>>>> everybody
>> >>>>> >>>>> who
>> >>>>> >>>>> has an interest in Wikipedia and analytics."
>> >>>>> >>>>> <[email protected]>
>> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]>
>> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than
>> >>>>> >>>>> hourly
>> >>>>> >>>>> basis
>> >>>>> >>>>> Message-ID:
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com>
>> >>>>> >>>>> Content-Type: text/plain; charset=UTF-8
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> Preeetty sure that Toby is on the analytics list, Pine. He's
>> >>>>> >>>>> the
>> >>>>> >>>>> director of analytics.
>> >>>>> >>>>>
>> >>>>> >>>>> Hirav: would you be looking for temporally /and/ contextually
>> >>>>> >>>>> granular
>> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >>>>> >>>>> temporally
>> >>>>> >>>>> granular, so "a view to a page on enwiki at X time"? If the
>> >>>>> >>>>> latter
>> >>>>> >>>>> you've got more of a shot, I suspect.
>> >>>>> >>>>>
>> >>>>> >>>>> On 13 April 2015 at 03:47, Pine W <[email protected]> wrote:
>> >>>>> >>>>>
>> >>>>> >>>>> Hi,
>> >>>>> >>>>>
>> >>>>> >>>>> This issue of pageview data granularity has been discussed
>> >>>>> >>>>> before, and
>> >>>>> >>> the
>> >>>>> >>>>> answer has been that hourly is the smallest increment allowed
>> >>>>> >>>>> to
>> >>>>> >>>>> be
>> >>>>> >>> revealed
>> >>>>> >>>>> publicly, for privacy reasons.
>> >>>>> >>>>>
>> >>>>> >>>>> I believe that the person you will want to discuss your
>> >>>>> >>>>> request
>> >>>>> >>>>> with is
>> >>>>> >>>>> Toby, who I have cc'd here.
>> >>>>> >>>>>
>> >>>>> >>>>> Pine
>> >>>>> >>>>>
>> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >>>>> >>>>> <[email protected]>
>> >>>>> >>> wrote:
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> Hi Wikimedia Analytics Team,
>> >>>>> >>>>>
>> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic
>> >>>>> >>>>> server
>> >>>>> >>> allocation
>> >>>>> >>>>> algorithms and we were looking for a suitable datasets to test
>> >>>>> >>>>> our
>> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an
>> >>>>> >>>>> amazing
>> >>>>> >>>>> data
>> >>>>> >>> set
>> >>>>> >>>>> of hourly page views, but we were looking for something a bit
>> >>>>> >>>>> more
>> >>>>> >>> granular,
>> >>>>> >>>>> such as aggregated page requests to English Wikipedia on a
>> >>>>> >>>>> minute
>> >>>>> >>>>> by
>> >>>>> >>> minute
>> >>>>> >>>>> basis or second by second basis if possible.
>> >>>>> >>>>>
>> >>>>> >>>>> We are more than happy to pour through any raw data you might
>> >>>>> >>>>> have that
>> >>>>> >>>>> would help us calculate page requests at this granular level.
>> >>>>> >>>>> Please
>> >>>>> >>> let us
>> >>>>> >>>>> know if it would be possible to get such data and if so how.
>> >>>>> >>>>> Thank you
>> >>>>> >>> in
>> >>>>> >>>>> advance for your help.
>> >>>>> >>>>>
>> >>>>> >>>>> Best,
>> >>>>> >>>>>
>> >>>>> >>>>> Hirav Gandhi
>> >>>>> >>>>> _______________________________________________
>> >>>>> >>>>> Analytics mailing list
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> _______________________________________________
>> >>>>> >>>>> Analytics mailing list
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> --
>> >>>>> >>>>> Oliver Keyes
>> >>>>> >>>>> Research Analyst
>> >>>>> >>>>> Wikimedia Foundation
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> ------------------------------
>> >>>>> >>>>>
>> >>>>> >>>>> _______________________________________________
>> >>>>> >>>>> Analytics mailing list
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> End of Analytics Digest, Vol 38, Issue 21
>> >>>>> >>>>> *****************************************
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> _______________________________________________
>> >>>>> >>>>> Analytics mailing list
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>> --
>> >>>>> >>>> Oliver Keyes
>> >>>>> >>>> Research Analyst
>> >>>>> >>>> Wikimedia Foundation
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>> --
>> >>>>> >>> Oliver Keyes
>> >>>>> >>> Research Analyst
>> >>>>> >>> Wikimedia Foundation
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>> ------------------------------
>> >>>>> >>>
>> >>>>> >>> _______________________________________________
>> >>>>> >>> Analytics mailing list
>> >>>>> >>> [email protected]
>> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>
>> >>>>> >> -------------- next part --------------
>> >>>>> >> An HTML attachment was scrubbed...
>> >>>>> >> URL:
>> >>>>> >>
>> >>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html>
>> >>>>> >>
>> >>>>> >> ------------------------------
>> >>>>> >>
>> >>>>> >> Message: 3
>> >>>>> >> Date: Mon, 13 Apr 2015 19:40:04 -0400
>> >>>>> >> From: Oliver Keyes <[email protected]>
>> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody
>> >>>>> >> who
>> >>>>> >>       has an  interest in Wikipedia and analytics."
>> >>>>> >>       <[email protected]>
>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than
>> >>>>> >> hourly
>> >>>>> >>       basis
>> >>>>> >> Message-ID:
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> <caauqgdd6z5ussu11vw49fdmbsrhyejxku9yopyserib79j-...@mail.gmail.com>
>> >>>>> >> Content-Type: text/plain; charset=UTF-8
>> >>>>> >>
>> >>>>> >> ....
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> ...years?
>> >>>>> >>
>> >>>>> >> We have unsampled logs for, ah. 2 months.
>> >>>>> >>
>> >>>>> >> On 13 April 2015 at 19:30, Hirav Gandhi <[email protected]>
>> >>>>> >> wrote:
>> >>>>> >>> Thanks Oliver!
>> >>>>> >>>
>> >>>>> >>> We would like this data for as broad of a time period as you can
>> >>>>> >>> muster. The
>> >>>>> >>> more days, months and year represented in the dataset, the
>> >>>>> >>> better.
>> >>>>> >>>
>> >>>>> >>>>
>> >>>>> >>>> Okay, so:
>> >>>>> >>>>
>> >>>>> >>>> I took an hour from the pageviews logs,[0] and aggregated
>> >>>>> >>>> pageviews to
>> >>>>> >>>> enwiki (mobile and desktop both) by timestamp, down to
>> >>>>> >>>> one-second
>> >>>>> >>>> resolution levels. The lowest number of pageviews to enwiki per
>> >>>>> >>>> second
>> >>>>> >>>> was 2,981
>> >>>>> >>>>
>> >>>>> >>>> So, I don't personally have a problem with generating a release
>> >>>>> >>>> of:
>> >>>>> >>>>
>> >>>>> >>>> 1. Pageviews per second;
>> >>>>> >>>> 2. To enwiki;
>> >>>>> >>>> 3. Over $TIME_PERIOD;
>> >>>>> >>>> 4. grouping the mobile and desktop site
>> >>>>> >>>>
>> >>>>> >>>> But Dario or someone should chip in before I touch anything ;p
>> >>>>> >>>>
>> >>>>> >>>> 6am yesterday. 6am because it should be low-traffic, right? At
>> >>>>> >>>> least
>> >>>>> >>>> given our biases towards north america and europe
>> >>>>> >>>>
>> >>>>> >>>> On 13 April 2015 at 11:54, Oliver Keyes <[email protected]>
>> >>>>> >>>> wrote:
>> >>>>> >>>>> Then that sounds much more viable. I'll run a quick test now
>> >>>>> >>>>> to
>> >>>>> >>>>> see
>> >>>>> >>>>> how much clustering we'd see at, say, the one-second
>> >>>>> >>>>> resolution
>> >>>>> >>>>> level,
>> >>>>> >>>>> and throw it out here so we can make more informed decisions
>> >>>>> >>>>> about a
>> >>>>> >>>>> data release on this.
>> >>>>> >>>>>
>> >>>>> >>>>> On 13 April 2015 at 08:08, Hirav Gandhi
>> >>>>> >>>>> <[email protected]>
>> >>>>> >>>>> wrote:
>> >>>>> >>>>>> Hi Oliver,
>> >>>>> >>>>>>
>> >>>>> >>>>>> Re: Hirav: would you be looking for temporally /and/
>> >>>>> >>>>>> contextually
>> >>>>> >>>>>> granular
>> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >>>>> >>>>>> temporally
>> >>>>> >>>>>> granular,
>> >>>>> >>>>>> so "a view to a page on enwiki at X time"? If the latter
>> >>>>> >>>>>> you've
>> >>>>> >>>>>> got
>> >>>>> >>>>>> more of
>> >>>>> >>>>>> a shot, I suspect.
>> >>>>> >>>>>>
>> >>>>> >>>>>> I only want the latter - I am not concerned with the context
>> >>>>> >>>>>> so
>> >>>>> >>>>>> much as
>> >>>>> >>>>>> just
>> >>>>> >>>>>> “a view to a page on enwiki at X time.”
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hirav
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> On Apr 13, 2015, at 5:00 AM,
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>> Send Analytics mailing list submissions to
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>>
>> >>>>> >>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>> or, via email, send a message with subject or body 'help' to
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>>
>> >>>>> >>>>>> You can reach the person managing the list at
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>>
>> >>>>> >>>>>> When replying, please edit your Subject line so it is more
>> >>>>> >>>>>> specific
>> >>>>> >>>>>> than "Re: Contents of Analytics digest..."
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> Today's Topics:
>> >>>>> >>>>>>
>> >>>>> >>>>>>  1. Re: Page views on a more frequent than hourly basis (Pine
>> >>>>> >>>>>> W)
>> >>>>> >>>>>>  2. Re: Page views on a more frequent than hourly basis
>> >>>>> >>>>>> (Oliver
>> >>>>> >>>>>> Keyes)
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> ----------------------------------------------------------------------
>> >>>>> >>>>>>
>> >>>>> >>>>>> Message: 1
>> >>>>> >>>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>> >>>>> >>>>>> From: Pine W <[email protected]>
>> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and
>> >>>>> >>>>>> everybody
>> >>>>> >>>>>> who
>> >>>>> >>>>>> has an interest in Wikipedia and analytics."
>> >>>>> >>>>>> <[email protected]>
>> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]>
>> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than
>> >>>>> >>>>>> hourly
>> >>>>> >>>>>> basis
>> >>>>> >>>>>> Message-ID:
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com>
>> >>>>> >>>>>> Content-Type: text/plain; charset="utf-8"
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hi,
>> >>>>> >>>>>>
>> >>>>> >>>>>> This issue of pageview data granularity has been discussed
>> >>>>> >>>>>> before, and
>> >>>>> >>>>>> the
>> >>>>> >>>>>> answer has been that hourly is the smallest increment allowed
>> >>>>> >>>>>> to
>> >>>>> >>>>>> be
>> >>>>> >>>>>> revealed publicly, for privacy reasons.
>> >>>>> >>>>>>
>> >>>>> >>>>>> I believe that the person you will want to discuss your
>> >>>>> >>>>>> request
>> >>>>> >>>>>> with is
>> >>>>> >>>>>> Toby, who I have cc'd here.
>> >>>>> >>>>>>
>> >>>>> >>>>>> Pine
>> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >>>>> >>>>>> <[email protected]>
>> >>>>> >>>>>> wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hi Wikimedia Analytics Team,
>> >>>>> >>>>>>
>> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic
>> >>>>> >>>>>> server
>> >>>>> >>>>>> allocation
>> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to
>> >>>>> >>>>>> test
>> >>>>> >>>>>> our
>> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an
>> >>>>> >>>>>> amazing data
>> >>>>> >>>>>> set
>> >>>>> >>>>>> of hourly page views, but we were looking for something a bit
>> >>>>> >>>>>> more
>> >>>>> >>>>>> granular, such as aggregated page requests to English
>> >>>>> >>>>>> Wikipedia
>> >>>>> >>>>>> on a
>> >>>>> >>>>>> minute
>> >>>>> >>>>>> by minute basis or second by second basis if possible.
>> >>>>> >>>>>>
>> >>>>> >>>>>> We are more than happy to pour through any raw data you might
>> >>>>> >>>>>> have that
>> >>>>> >>>>>> would help us calculate page requests at this granular level.
>> >>>>> >>>>>> Please
>> >>>>> >>>>>> let us
>> >>>>> >>>>>> know if it would be possible to get such data and if so how.
>> >>>>> >>>>>> Thank you
>> >>>>> >>>>>> in
>> >>>>> >>>>>> advance for your help.
>> >>>>> >>>>>>
>> >>>>> >>>>>> Best,
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hirav Gandhi
>> >>>>> >>>>>> _______________________________________________
>> >>>>> >>>>>> Analytics mailing list
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>>
>> >>>>> >>>>>> -------------- next part --------------
>> >>>>> >>>>>> An HTML attachment was scrubbed...
>> >>>>> >>>>>> URL:
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html>
>> >>>>> >>>>>>
>> >>>>> >>>>>> ------------------------------
>> >>>>> >>>>>>
>> >>>>> >>>>>> Message: 2
>> >>>>> >>>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>> >>>>> >>>>>> From: Oliver Keyes <[email protected]>
>> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and
>> >>>>> >>>>>> everybody
>> >>>>> >>>>>> who
>> >>>>> >>>>>> has an interest in Wikipedia and analytics."
>> >>>>> >>>>>> <[email protected]>
>> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]>
>> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than
>> >>>>> >>>>>> hourly
>> >>>>> >>>>>> basis
>> >>>>> >>>>>> Message-ID:
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com>
>> >>>>> >>>>>> Content-Type: text/plain; charset=UTF-8
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> Preeetty sure that Toby is on the analytics list, Pine. He's
>> >>>>> >>>>>> the
>> >>>>> >>>>>> director of analytics.
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hirav: would you be looking for temporally /and/ contextually
>> >>>>> >>>>>> granular
>> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >>>>> >>>>>> temporally
>> >>>>> >>>>>> granular, so "a view to a page on enwiki at X time"? If the
>> >>>>> >>>>>> latter
>> >>>>> >>>>>> you've got more of a shot, I suspect.
>> >>>>> >>>>>>
>> >>>>> >>>>>> On 13 April 2015 at 03:47, Pine W <[email protected]>
>> >>>>> >>>>>> wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hi,
>> >>>>> >>>>>>
>> >>>>> >>>>>> This issue of pageview data granularity has been discussed
>> >>>>> >>>>>> before, and
>> >>>>> >>>>>> the
>> >>>>> >>>>>> answer has been that hourly is the smallest increment allowed
>> >>>>> >>>>>> to
>> >>>>> >>>>>> be
>> >>>>> >>>>>> revealed
>> >>>>> >>>>>> publicly, for privacy reasons.
>> >>>>> >>>>>>
>> >>>>> >>>>>> I believe that the person you will want to discuss your
>> >>>>> >>>>>> request
>> >>>>> >>>>>> with is
>> >>>>> >>>>>> Toby, who I have cc'd here.
>> >>>>> >>>>>>
>> >>>>> >>>>>> Pine
>> >>>>> >>>>>>
>> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >>>>> >>>>>> <[email protected]>
>> >>>>> >>>>>> wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hi Wikimedia Analytics Team,
>> >>>>> >>>>>>
>> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic
>> >>>>> >>>>>> server
>> >>>>> >>>>>> allocation
>> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to
>> >>>>> >>>>>> test
>> >>>>> >>>>>> our
>> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an
>> >>>>> >>>>>> amazing data
>> >>>>> >>>>>> set
>> >>>>> >>>>>> of hourly page views, but we were looking for something a bit
>> >>>>> >>>>>> more
>> >>>>> >>>>>> granular,
>> >>>>> >>>>>> such as aggregated page requests to English Wikipedia on a
>> >>>>> >>>>>> minute by
>> >>>>> >>>>>> minute
>> >>>>> >>>>>> basis or second by second basis if possible.
>> >>>>> >>>>>>
>> >>>>> >>>>>> We are more than happy to pour through any raw data you might
>> >>>>> >>>>>> have that
>> >>>>> >>>>>> would help us calculate page requests at this granular level.
>> >>>>> >>>>>> Please
>> >>>>> >>>>>> let us
>> >>>>> >>>>>> know if it would be possible to get such data and if so how.
>> >>>>> >>>>>> Thank you
>> >>>>> >>>>>> in
>> >>>>> >>>>>> advance for your help.
>> >>>>> >>>>>>
>> >>>>> >>>>>> Best,
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hirav Gandhi
>> >>>>> >>>>>> _______________________________________________
>> >>>>> >>>>>> Analytics mailing list
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> _______________________________________________
>> >>>>> >>>>>> Analytics mailing list
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> --
>> >>>>> >>>>>> Oliver Keyes
>> >>>>> >>>>>> Research Analyst
>> >>>>> >>>>>> Wikimedia Foundation
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> ------------------------------
>> >>>>> >>>>>>
>> >>>>> >>>>>> _______________________________________________
>> >>>>> >>>>>> Analytics mailing list
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> End of Analytics Digest, Vol 38, Issue 21
>> >>>>> >>>>>> *****************************************
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> _______________________________________________
>> >>>>> >>>>>> Analytics mailing list
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> --
>> >>>>> >>>>> Oliver Keyes
>> >>>>> >>>>> Research Analyst
>> >>>>> >>>>> Wikimedia Foundation
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>> --
>> >>>>> >>>> Oliver Keyes
>> >>>>> >>>> Research Analyst
>> >>>>> >>>> Wikimedia Foundation
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>> ------------------------------
>> >>>>> >>>>
>> >>>>> >>>> _______________________________________________
>> >>>>> >>>> Analytics mailing list
>> >>>>> >>>> [email protected]
>> >>>>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>> _______________________________________________
>> >>>>> >>> Analytics mailing list
>> >>>>> >>> [email protected]
>> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> --
>> >>>>> >> Oliver Keyes
>> >>>>> >> Research Analyst
>> >>>>> >> Wikimedia Foundation
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> ------------------------------
>> >>>>> >>
>> >>>>> >> _______________________________________________
>> >>>>> >> Analytics mailing list
>> >>>>> >> [email protected]
>> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> End of Analytics Digest, Vol 38, Issue 24
>> >>>>> >> *****************************************
>> >>>>> >
>> >>>>> >
>> >>>>> > _______________________________________________
>> >>>>> > Analytics mailing list
>> >>>>> > [email protected]
>> >>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Oliver Keyes
>> >>>>> Research Analyst
>> >>>>> Wikimedia Foundation
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Dario Taraborelli
>> >>>> Senior Research Scientist, Research and Data Lead
>> >>>> Wikimedia Foundation
>> >>>> http://wikimediafoundation.org
>> >>>> http://nitens.org/taraborelli
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Dario Taraborelli
>> >>> Senior Research Scientist, Research and Data Lead
>> >>> Wikimedia Foundation
>> >>> http://wikimediafoundation.org
>> >>> http://nitens.org/taraborelli
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> --
> Dario Taraborelli
> Senior Research Scientist, Research and Data Lead
> Wikimedia Foundation
> http://wikimediafoundation.org
> http://nitens.org/taraborelli
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to