Hi Dario,
One last question - would it be possible to break it out into mobile vs
desktop? We are also concerned there might be seasonality effects in there as
well. Please let us know.
Best,
Hirav
On Wed, Apr 15, 2015 at 10:27 AM, Dario Taraborelli
<[email protected]> wrote:
> thanks, both. Let's go ahead with English only and no spiders filtered or
> mobile/desktop breakdown, per Oliver.
> Michelle – given the aggregation level I am fine moving forward with this
> release, but let me know off-thread if you have any questions.
> Dario
> On Wed, Apr 15, 2015 at 9:53 AM, Oliver Keyes <[email protected]> wrote:
>> Dario,
>>
>> No spider filtering, and no split between mobile and desktop; mobile
>> and desktop are grouped.
>>
>> On 15 April 2015 at 12:46, Hirav Gandhi <[email protected]> wrote:
>> > e.g. German*
>> >
>> > I need more coffee.
>> >
>> >
>> >
>> > On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi <[email protected]>
>> > wrote:
>> >>
>> >> Dario - we just want a representative samples of traffic for a popular
>> >> site like Wikipedia. We thought limiting to the English Wikipedia would
>> be
>> >> easier.
>> >>
>> >> If we get aggregated data across all language Wikipedia sites, we would
>> >> need someway to tease out which language is being queried when. Some
>> >> languages (for e.g. German) we would hypothesize would have more daily
>> >> seasonality than languages like English.
>> >>
>> >>
>> >>
>> >> On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli
>> >> <[email protected]> wrote:
>> >>>
>> >>> Hirav, Bharath – I also want to hear from you if there's a specific
>> >>> reason to ask for English Wikipedia only or if a dataset encompassing
>> >>> aggregate pageviews across all Wikimedia properties would do the job.
>> >>>
>> >>> Dario
>> >>>
>> >>> On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli
>> >>> <[email protected]> wrote:
>> >>>>
>> >>>> Oliver -- thanks for running a preliminary check, I'm fine releasing
>> >>>> this data in aggregate under CC0, I believe it would be valuable for
>> this
>> >>>> and other research projects (copying Michelle from Legal).
>> >>>>
>> >>>> Before we do so, though, I want to confirm the specs: aggregate
>> >>>> pageviews per second to English Wikipedia, excluding bot traffic,
>> broken
>> >>>> down by access method (mobile web vs desktop site, not apps) for a
>> 60-day
>> >>>> period. Oliver – are these the filters you used to identify the data
>> point
>> >>>> with the smallest number of observations?
>> >>>>
>> >>>> Obviously, we will need to take into account this release when we
>> start
>> >>>> working on projects such as
>> >>>>
>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits
>> >>>> and
>> >>>>
>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews
>> >>>>
>> >>>> Dario
>> >>>>
>> >>>> On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes <[email protected]>
>> >>>> wrote:
>> >>>>>
>> >>>>> Bumping for Dario, per Pine's excellent example :)
>> >>>>>
>> >>>>> On 13 April 2015 at 22:18, Hirav Gandhi <[email protected]>
>> wrote:
>> >>>>> > Oliver: Two months is fine. Thank you so much for your help!
>> >>>>> >
>> >>>>> >> On Apr 13, 2015, at 4:40 PM,
>> [email protected]
>> >>>>> >> wrote:
>> >>>>> >>
>> >>>>> >> Send Analytics mailing list submissions to
>> >>>>> >> [email protected]
>> >>>>> >>
>> >>>>> >> To subscribe or unsubscribe via the World Wide Web, visit
>> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >> or, via email, send a message with subject or body 'help' to
>> >>>>> >> [email protected]
>> >>>>> >>
>> >>>>> >> You can reach the person managing the list at
>> >>>>> >> [email protected]
>> >>>>> >>
>> >>>>> >> When replying, please edit your Subject line so it is more
>> specific
>> >>>>> >> than "Re: Contents of Analytics digest..."
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> Today's Topics:
>> >>>>> >>
>> >>>>> >> 1. Re: Page views on a more frequent than hourly basis (Pine W)
>> >>>>> >> 2. Re: Page views on a more frequent than hourly basis (Hirav
>> >>>>> >> Gandhi)
>> >>>>> >> 3. Re: Page views on a more frequent than hourly basis (Oliver
>> >>>>> >> Keyes)
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> ----------------------------------------------------------------------
>> >>>>> >>
>> >>>>> >> Message: 1
>> >>>>> >> Date: Mon, 13 Apr 2015 13:34:23 -0700
>> >>>>> >> From: Pine W <[email protected]>
>> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody
>> who
>> >>>>> >> has an interest in Wikipedia and analytics."
>> >>>>> >> <[email protected]>
>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly
>> >>>>> >> basis
>> >>>>> >> Message-ID:
>> >>>>> >>
>> >>>>> >> <CAF=
>> [email protected]>
>> >>>>> >> Content-Type: text/plain; charset="utf-8"
>> >>>>> >>
>> >>>>> >> Hi Oliver, re ccing people who are on list, this is the protocol
>> we
>> >>>>> >> followed in IEGCom to ping people who are subscribed and mentioned
>> >>>>> >> in
>> >>>>> >> certain emails but, like many of us, may automatically move emails
>> >>>>> >> from
>> >>>>> >> lists directly to folders where they may be unread for days. So
>> >>>>> >> there is a
>> >>>>> >> reason to do this.
>> >>>>> >>
>> >>>>> >> Thanks,
>> >>>>> >>
>> >>>>> >> Pine
>> >>>>> >> -------------- next part --------------
>> >>>>> >> An HTML attachment was scrubbed...
>> >>>>> >> URL:
>> >>>>> >> <
>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html
>> >
>> >>>>> >>
>> >>>>> >> ------------------------------
>> >>>>> >>
>> >>>>> >> Message: 2
>> >>>>> >> Date: Mon, 13 Apr 2015 16:30:43 -0700
>> >>>>> >> From: Hirav Gandhi <[email protected]>
>> >>>>> >> To: [email protected]
>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly
>> >>>>> >> basis
>> >>>>> >> Message-ID:
>> >>>>> >>
>> >>>>> >> <CANzC_EOvi4MP7G_SsxvW=
>> [email protected]>
>> >>>>> >> Content-Type: text/plain; charset="utf-8"
>> >>>>> >>
>> >>>>> >> Thanks Oliver!
>> >>>>> >>
>> >>>>> >> We would like this data for as broad of a time period as you can
>> >>>>> >> muster.
>> >>>>> >> The more days, months and year represented in the dataset, the
>> >>>>> >> better.
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>> Okay, so:
>> >>>>> >>>
>> >>>>> >>> I took an hour from the pageviews logs,[0] and aggregated
>> pageviews
>> >>>>> >>> to
>> >>>>> >>> enwiki (mobile and desktop both) by timestamp, down to one-second
>> >>>>> >>> resolution levels. The lowest number of pageviews to enwiki per
>> >>>>> >>> second
>> >>>>> >>> was 2,981
>> >>>>> >>>
>> >>>>> >>> So, I don't personally have a problem with generating a release
>> of:
>> >>>>> >>>
>> >>>>> >>> 1. Pageviews per second;
>> >>>>> >>> 2. To enwiki;
>> >>>>> >>> 3. Over $TIME_PERIOD;
>> >>>>> >>> 4. grouping the mobile and desktop site
>> >>>>> >>>
>> >>>>> >>> But Dario or someone should chip in before I touch anything ;p
>> >>>>> >>>
>> >>>>> >>> 6am yesterday. 6am because it should be low-traffic, right? At
>> >>>>> >>> least
>> >>>>> >>> given our biases towards north america and europe
>> >>>>> >>>
>> >>>>> >>> On 13 April 2015 at 11:54, Oliver Keyes <[email protected]>
>> >>>>> >>> wrote:
>> >>>>> >>>> Then that sounds much more viable. I'll run a quick test now to
>> >>>>> >>>> see
>> >>>>> >>>> how much clustering we'd see at, say, the one-second resolution
>> >>>>> >>>> level,
>> >>>>> >>>> and throw it out here so we can make more informed decisions
>> about
>> >>>>> >>>> a
>> >>>>> >>>> data release on this.
>> >>>>> >>>>
>> >>>>> >>>> On 13 April 2015 at 08:08, Hirav Gandhi <[email protected]
>> >
>> >>>>> >>>> wrote:
>> >>>>> >>>>> Hi Oliver,
>> >>>>> >>>>>
>> >>>>> >>>>> Re: Hirav: would you be looking for temporally /and/
>> contextually
>> >>>>> >>> granular
>> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just
>> temporally
>> >>>>> >>> granular,
>> >>>>> >>>>> so "a view to a page on enwiki at X time"? If the latter you've
>> >>>>> >>>>> got
>> >>>>> >>> more of
>> >>>>> >>>>> a shot, I suspect.
>> >>>>> >>>>>
>> >>>>> >>>>> I only want the latter - I am not concerned with the context so
>> >>>>> >>>>> much as
>> >>>>> >>> just
>> >>>>> >>>>> “a view to a page on enwiki at X time.”
>> >>>>> >>>>>
>> >>>>> >>>>> Hirav
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> On Apr 13, 2015, at 5:00 AM,
>> >>>>> >>>>> [email protected]
>> >>>>> >>> wrote:
>> >>>>> >>>>>
>> >>>>> >>>>> Send Analytics mailing list submissions to
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>>
>> >>>>> >>>>> To subscribe or unsubscribe via the World Wide Web, visit
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>> or, via email, send a message with subject or body 'help' to
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>>
>> >>>>> >>>>> You can reach the person managing the list at
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>>
>> >>>>> >>>>> When replying, please edit your Subject line so it is more
>> >>>>> >>>>> specific
>> >>>>> >>>>> than "Re: Contents of Analytics digest..."
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> Today's Topics:
>> >>>>> >>>>>
>> >>>>> >>>>> 1. Re: Page views on a more frequent than hourly basis (Pine
>> W)
>> >>>>> >>>>> 2. Re: Page views on a more frequent than hourly basis (Oliver
>> >>>>> >>>>> Keyes)
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> ----------------------------------------------------------------------
>> >>>>> >>>>>
>> >>>>> >>>>> Message: 1
>> >>>>> >>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>> >>>>> >>>>> From: Pine W <[email protected]>
>> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody
>> >>>>> >>>>> who
>> >>>>> >>>>> has an interest in Wikipedia and analytics."
>> >>>>> >>>>> <[email protected]>
>> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]>
>> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than
>> >>>>> >>>>> hourly
>> >>>>> >>>>> basis
>> >>>>> >>>>> Message-ID:
>> >>>>> >>>>>
>> >>>>> >>>>> <CAF=
>> [email protected]>
>> >>>>> >>>>> Content-Type: text/plain; charset="utf-8"
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> Hi,
>> >>>>> >>>>>
>> >>>>> >>>>> This issue of pageview data granularity has been discussed
>> >>>>> >>>>> before, and
>> >>>>> >>> the
>> >>>>> >>>>> answer has been that hourly is the smallest increment allowed
>> to
>> >>>>> >>>>> be
>> >>>>> >>>>> revealed publicly, for privacy reasons.
>> >>>>> >>>>>
>> >>>>> >>>>> I believe that the person you will want to discuss your request
>> >>>>> >>>>> with is
>> >>>>> >>>>> Toby, who I have cc'd here.
>> >>>>> >>>>>
>> >>>>> >>>>> Pine
>> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <
>> [email protected]>
>> >>>>> >>> wrote:
>> >>>>> >>>>>
>> >>>>> >>>>> Hi Wikimedia Analytics Team,
>> >>>>> >>>>>
>> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic server
>> >>>>> >>> allocation
>> >>>>> >>>>> algorithms and we were looking for a suitable datasets to test
>> >>>>> >>>>> our
>> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an
>> amazing
>> >>>>> >>>>> data
>> >>>>> >>> set
>> >>>>> >>>>> of hourly page views, but we were looking for something a bit
>> >>>>> >>>>> more
>> >>>>> >>>>> granular, such as aggregated page requests to English Wikipedia
>> >>>>> >>>>> on a
>> >>>>> >>> minute
>> >>>>> >>>>> by minute basis or second by second basis if possible.
>> >>>>> >>>>>
>> >>>>> >>>>> We are more than happy to pour through any raw data you might
>> >>>>> >>>>> have that
>> >>>>> >>>>> would help us calculate page requests at this granular level.
>> >>>>> >>>>> Please
>> >>>>> >>> let us
>> >>>>> >>>>> know if it would be possible to get such data and if so how.
>> >>>>> >>>>> Thank you
>> >>>>> >>> in
>> >>>>> >>>>> advance for your help.
>> >>>>> >>>>>
>> >>>>> >>>>> Best,
>> >>>>> >>>>>
>> >>>>> >>>>> Hirav Gandhi
>> >>>>> >>>>> _______________________________________________
>> >>>>> >>>>> Analytics mailing list
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>
>> >>>>> >>>>> -------------- next part --------------
>> >>>>> >>>>> An HTML attachment was scrubbed...
>> >>>>> >>>>> URL:
>> >>>>> >>>>> <
>> >>>>> >>>
>> >>>>> >>>
>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html
>> >>>>> >>>>
>> >>>>> >>>>>
>> >>>>> >>>>> ------------------------------
>> >>>>> >>>>>
>> >>>>> >>>>> Message: 2
>> >>>>> >>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>> >>>>> >>>>> From: Oliver Keyes <[email protected]>
>> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody
>> >>>>> >>>>> who
>> >>>>> >>>>> has an interest in Wikipedia and analytics."
>> >>>>> >>>>> <[email protected]>
>> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]>
>> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than
>> >>>>> >>>>> hourly
>> >>>>> >>>>> basis
>> >>>>> >>>>> Message-ID:
>> >>>>> >>>>>
>> >>>>> >>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=
>> [email protected]>
>> >>>>> >>>>> Content-Type: text/plain; charset=UTF-8
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> Preeetty sure that Toby is on the analytics list, Pine. He's
>> the
>> >>>>> >>>>> director of analytics.
>> >>>>> >>>>>
>> >>>>> >>>>> Hirav: would you be looking for temporally /and/ contextually
>> >>>>> >>>>> granular
>> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just
>> temporally
>> >>>>> >>>>> granular, so "a view to a page on enwiki at X time"? If the
>> >>>>> >>>>> latter
>> >>>>> >>>>> you've got more of a shot, I suspect.
>> >>>>> >>>>>
>> >>>>> >>>>> On 13 April 2015 at 03:47, Pine W <[email protected]> wrote:
>> >>>>> >>>>>
>> >>>>> >>>>> Hi,
>> >>>>> >>>>>
>> >>>>> >>>>> This issue of pageview data granularity has been discussed
>> >>>>> >>>>> before, and
>> >>>>> >>> the
>> >>>>> >>>>> answer has been that hourly is the smallest increment allowed
>> to
>> >>>>> >>>>> be
>> >>>>> >>> revealed
>> >>>>> >>>>> publicly, for privacy reasons.
>> >>>>> >>>>>
>> >>>>> >>>>> I believe that the person you will want to discuss your request
>> >>>>> >>>>> with is
>> >>>>> >>>>> Toby, who I have cc'd here.
>> >>>>> >>>>>
>> >>>>> >>>>> Pine
>> >>>>> >>>>>
>> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <
>> [email protected]>
>> >>>>> >>> wrote:
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> Hi Wikimedia Analytics Team,
>> >>>>> >>>>>
>> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic server
>> >>>>> >>> allocation
>> >>>>> >>>>> algorithms and we were looking for a suitable datasets to test
>> >>>>> >>>>> our
>> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an
>> amazing
>> >>>>> >>>>> data
>> >>>>> >>> set
>> >>>>> >>>>> of hourly page views, but we were looking for something a bit
>> >>>>> >>>>> more
>> >>>>> >>> granular,
>> >>>>> >>>>> such as aggregated page requests to English Wikipedia on a
>> minute
>> >>>>> >>>>> by
>> >>>>> >>> minute
>> >>>>> >>>>> basis or second by second basis if possible.
>> >>>>> >>>>>
>> >>>>> >>>>> We are more than happy to pour through any raw data you might
>> >>>>> >>>>> have that
>> >>>>> >>>>> would help us calculate page requests at this granular level.
>> >>>>> >>>>> Please
>> >>>>> >>> let us
>> >>>>> >>>>> know if it would be possible to get such data and if so how.
>> >>>>> >>>>> Thank you
>> >>>>> >>> in
>> >>>>> >>>>> advance for your help.
>> >>>>> >>>>>
>> >>>>> >>>>> Best,
>> >>>>> >>>>>
>> >>>>> >>>>> Hirav Gandhi
>> >>>>> >>>>> _______________________________________________
>> >>>>> >>>>> Analytics mailing list
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> _______________________________________________
>> >>>>> >>>>> Analytics mailing list
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> --
>> >>>>> >>>>> Oliver Keyes
>> >>>>> >>>>> Research Analyst
>> >>>>> >>>>> Wikimedia Foundation
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> ------------------------------
>> >>>>> >>>>>
>> >>>>> >>>>> _______________________________________________
>> >>>>> >>>>> Analytics mailing list
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> End of Analytics Digest, Vol 38, Issue 21
>> >>>>> >>>>> *****************************************
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> _______________________________________________
>> >>>>> >>>>> Analytics mailing list
>> >>>>> >>>>> [email protected]
>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>> --
>> >>>>> >>>> Oliver Keyes
>> >>>>> >>>> Research Analyst
>> >>>>> >>>> Wikimedia Foundation
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>> --
>> >>>>> >>> Oliver Keyes
>> >>>>> >>> Research Analyst
>> >>>>> >>> Wikimedia Foundation
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>> ------------------------------
>> >>>>> >>>
>> >>>>> >>> _______________________________________________
>> >>>>> >>> Analytics mailing list
>> >>>>> >>> [email protected]
>> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>
>> >>>>> >> -------------- next part --------------
>> >>>>> >> An HTML attachment was scrubbed...
>> >>>>> >> URL:
>> >>>>> >> <
>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html
>> >
>> >>>>> >>
>> >>>>> >> ------------------------------
>> >>>>> >>
>> >>>>> >> Message: 3
>> >>>>> >> Date: Mon, 13 Apr 2015 19:40:04 -0400
>> >>>>> >> From: Oliver Keyes <[email protected]>
>> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody
>> who
>> >>>>> >> has an interest in Wikipedia and analytics."
>> >>>>> >> <[email protected]>
>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly
>> >>>>> >> basis
>> >>>>> >> Message-ID:
>> >>>>> >>
>> >>>>> >> <
>> caauqgdd6z5ussu11vw49fdmbsrhyejxku9yopyserib79j-...@mail.gmail.com>
>> >>>>> >> Content-Type: text/plain; charset=UTF-8
>> >>>>> >>
>> >>>>> >> ....
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> ...years?
>> >>>>> >>
>> >>>>> >> We have unsampled logs for, ah. 2 months.
>> >>>>> >>
>> >>>>> >> On 13 April 2015 at 19:30, Hirav Gandhi <[email protected]>
>> >>>>> >> wrote:
>> >>>>> >>> Thanks Oliver!
>> >>>>> >>>
>> >>>>> >>> We would like this data for as broad of a time period as you can
>> >>>>> >>> muster. The
>> >>>>> >>> more days, months and year represented in the dataset, the
>> better.
>> >>>>> >>>
>> >>>>> >>>>
>> >>>>> >>>> Okay, so:
>> >>>>> >>>>
>> >>>>> >>>> I took an hour from the pageviews logs,[0] and aggregated
>> >>>>> >>>> pageviews to
>> >>>>> >>>> enwiki (mobile and desktop both) by timestamp, down to
>> one-second
>> >>>>> >>>> resolution levels. The lowest number of pageviews to enwiki per
>> >>>>> >>>> second
>> >>>>> >>>> was 2,981
>> >>>>> >>>>
>> >>>>> >>>> So, I don't personally have a problem with generating a release
>> >>>>> >>>> of:
>> >>>>> >>>>
>> >>>>> >>>> 1. Pageviews per second;
>> >>>>> >>>> 2. To enwiki;
>> >>>>> >>>> 3. Over $TIME_PERIOD;
>> >>>>> >>>> 4. grouping the mobile and desktop site
>> >>>>> >>>>
>> >>>>> >>>> But Dario or someone should chip in before I touch anything ;p
>> >>>>> >>>>
>> >>>>> >>>> 6am yesterday. 6am because it should be low-traffic, right? At
>> >>>>> >>>> least
>> >>>>> >>>> given our biases towards north america and europe
>> >>>>> >>>>
>> >>>>> >>>> On 13 April 2015 at 11:54, Oliver Keyes <[email protected]>
>> >>>>> >>>> wrote:
>> >>>>> >>>>> Then that sounds much more viable. I'll run a quick test now to
>> >>>>> >>>>> see
>> >>>>> >>>>> how much clustering we'd see at, say, the one-second resolution
>> >>>>> >>>>> level,
>> >>>>> >>>>> and throw it out here so we can make more informed decisions
>> >>>>> >>>>> about a
>> >>>>> >>>>> data release on this.
>> >>>>> >>>>>
>> >>>>> >>>>> On 13 April 2015 at 08:08, Hirav Gandhi <
>> [email protected]>
>> >>>>> >>>>> wrote:
>> >>>>> >>>>>> Hi Oliver,
>> >>>>> >>>>>>
>> >>>>> >>>>>> Re: Hirav: would you be looking for temporally /and/
>> >>>>> >>>>>> contextually
>> >>>>> >>>>>> granular
>> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just
>> temporally
>> >>>>> >>>>>> granular,
>> >>>>> >>>>>> so "a view to a page on enwiki at X time"? If the latter
>> you've
>> >>>>> >>>>>> got
>> >>>>> >>>>>> more of
>> >>>>> >>>>>> a shot, I suspect.
>> >>>>> >>>>>>
>> >>>>> >>>>>> I only want the latter - I am not concerned with the context
>> so
>> >>>>> >>>>>> much as
>> >>>>> >>>>>> just
>> >>>>> >>>>>> “a view to a page on enwiki at X time.”
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hirav
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> On Apr 13, 2015, at 5:00 AM,
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>> Send Analytics mailing list submissions to
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>>
>> >>>>> >>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>> or, via email, send a message with subject or body 'help' to
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>>
>> >>>>> >>>>>> You can reach the person managing the list at
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>>
>> >>>>> >>>>>> When replying, please edit your Subject line so it is more
>> >>>>> >>>>>> specific
>> >>>>> >>>>>> than "Re: Contents of Analytics digest..."
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> Today's Topics:
>> >>>>> >>>>>>
>> >>>>> >>>>>> 1. Re: Page views on a more frequent than hourly basis (Pine
>> W)
>> >>>>> >>>>>> 2. Re: Page views on a more frequent than hourly basis
>> (Oliver
>> >>>>> >>>>>> Keyes)
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> ----------------------------------------------------------------------
>> >>>>> >>>>>>
>> >>>>> >>>>>> Message: 1
>> >>>>> >>>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>> >>>>> >>>>>> From: Pine W <[email protected]>
>> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and
>> everybody
>> >>>>> >>>>>> who
>> >>>>> >>>>>> has an interest in Wikipedia and analytics."
>> >>>>> >>>>>> <[email protected]>
>> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]>
>> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than
>> >>>>> >>>>>> hourly
>> >>>>> >>>>>> basis
>> >>>>> >>>>>> Message-ID:
>> >>>>> >>>>>>
>> >>>>> >>>>>> <CAF=
>> [email protected]>
>> >>>>> >>>>>> Content-Type: text/plain; charset="utf-8"
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hi,
>> >>>>> >>>>>>
>> >>>>> >>>>>> This issue of pageview data granularity has been discussed
>> >>>>> >>>>>> before, and
>> >>>>> >>>>>> the
>> >>>>> >>>>>> answer has been that hourly is the smallest increment allowed
>> to
>> >>>>> >>>>>> be
>> >>>>> >>>>>> revealed publicly, for privacy reasons.
>> >>>>> >>>>>>
>> >>>>> >>>>>> I believe that the person you will want to discuss your
>> request
>> >>>>> >>>>>> with is
>> >>>>> >>>>>> Toby, who I have cc'd here.
>> >>>>> >>>>>>
>> >>>>> >>>>>> Pine
>> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >>>>> >>>>>> <[email protected]>
>> >>>>> >>>>>> wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hi Wikimedia Analytics Team,
>> >>>>> >>>>>>
>> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic
>> server
>> >>>>> >>>>>> allocation
>> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to test
>> >>>>> >>>>>> our
>> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an
>> >>>>> >>>>>> amazing data
>> >>>>> >>>>>> set
>> >>>>> >>>>>> of hourly page views, but we were looking for something a bit
>> >>>>> >>>>>> more
>> >>>>> >>>>>> granular, such as aggregated page requests to English
>> Wikipedia
>> >>>>> >>>>>> on a
>> >>>>> >>>>>> minute
>> >>>>> >>>>>> by minute basis or second by second basis if possible.
>> >>>>> >>>>>>
>> >>>>> >>>>>> We are more than happy to pour through any raw data you might
>> >>>>> >>>>>> have that
>> >>>>> >>>>>> would help us calculate page requests at this granular level.
>> >>>>> >>>>>> Please
>> >>>>> >>>>>> let us
>> >>>>> >>>>>> know if it would be possible to get such data and if so how.
>> >>>>> >>>>>> Thank you
>> >>>>> >>>>>> in
>> >>>>> >>>>>> advance for your help.
>> >>>>> >>>>>>
>> >>>>> >>>>>> Best,
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hirav Gandhi
>> >>>>> >>>>>> _______________________________________________
>> >>>>> >>>>>> Analytics mailing list
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>>
>> >>>>> >>>>>> -------------- next part --------------
>> >>>>> >>>>>> An HTML attachment was scrubbed...
>> >>>>> >>>>>> URL:
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> <
>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html
>> >
>> >>>>> >>>>>>
>> >>>>> >>>>>> ------------------------------
>> >>>>> >>>>>>
>> >>>>> >>>>>> Message: 2
>> >>>>> >>>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>> >>>>> >>>>>> From: Oliver Keyes <[email protected]>
>> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and
>> everybody
>> >>>>> >>>>>> who
>> >>>>> >>>>>> has an interest in Wikipedia and analytics."
>> >>>>> >>>>>> <[email protected]>
>> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]>
>> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than
>> >>>>> >>>>>> hourly
>> >>>>> >>>>>> basis
>> >>>>> >>>>>> Message-ID:
>> >>>>> >>>>>>
>> >>>>> >>>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=
>> [email protected]>
>> >>>>> >>>>>> Content-Type: text/plain; charset=UTF-8
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> Preeetty sure that Toby is on the analytics list, Pine. He's
>> the
>> >>>>> >>>>>> director of analytics.
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hirav: would you be looking for temporally /and/ contextually
>> >>>>> >>>>>> granular
>> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just
>> temporally
>> >>>>> >>>>>> granular, so "a view to a page on enwiki at X time"? If the
>> >>>>> >>>>>> latter
>> >>>>> >>>>>> you've got more of a shot, I suspect.
>> >>>>> >>>>>>
>> >>>>> >>>>>> On 13 April 2015 at 03:47, Pine W <[email protected]>
>> wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hi,
>> >>>>> >>>>>>
>> >>>>> >>>>>> This issue of pageview data granularity has been discussed
>> >>>>> >>>>>> before, and
>> >>>>> >>>>>> the
>> >>>>> >>>>>> answer has been that hourly is the smallest increment allowed
>> to
>> >>>>> >>>>>> be
>> >>>>> >>>>>> revealed
>> >>>>> >>>>>> publicly, for privacy reasons.
>> >>>>> >>>>>>
>> >>>>> >>>>>> I believe that the person you will want to discuss your
>> request
>> >>>>> >>>>>> with is
>> >>>>> >>>>>> Toby, who I have cc'd here.
>> >>>>> >>>>>>
>> >>>>> >>>>>> Pine
>> >>>>> >>>>>>
>> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >>>>> >>>>>> <[email protected]>
>> >>>>> >>>>>> wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hi Wikimedia Analytics Team,
>> >>>>> >>>>>>
>> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic
>> server
>> >>>>> >>>>>> allocation
>> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to test
>> >>>>> >>>>>> our
>> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an
>> >>>>> >>>>>> amazing data
>> >>>>> >>>>>> set
>> >>>>> >>>>>> of hourly page views, but we were looking for something a bit
>> >>>>> >>>>>> more
>> >>>>> >>>>>> granular,
>> >>>>> >>>>>> such as aggregated page requests to English Wikipedia on a
>> >>>>> >>>>>> minute by
>> >>>>> >>>>>> minute
>> >>>>> >>>>>> basis or second by second basis if possible.
>> >>>>> >>>>>>
>> >>>>> >>>>>> We are more than happy to pour through any raw data you might
>> >>>>> >>>>>> have that
>> >>>>> >>>>>> would help us calculate page requests at this granular level.
>> >>>>> >>>>>> Please
>> >>>>> >>>>>> let us
>> >>>>> >>>>>> know if it would be possible to get such data and if so how.
>> >>>>> >>>>>> Thank you
>> >>>>> >>>>>> in
>> >>>>> >>>>>> advance for your help.
>> >>>>> >>>>>>
>> >>>>> >>>>>> Best,
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hirav Gandhi
>> >>>>> >>>>>> _______________________________________________
>> >>>>> >>>>>> Analytics mailing list
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> _______________________________________________
>> >>>>> >>>>>> Analytics mailing list
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> --
>> >>>>> >>>>>> Oliver Keyes
>> >>>>> >>>>>> Research Analyst
>> >>>>> >>>>>> Wikimedia Foundation
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> ------------------------------
>> >>>>> >>>>>>
>> >>>>> >>>>>> _______________________________________________
>> >>>>> >>>>>> Analytics mailing list
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> End of Analytics Digest, Vol 38, Issue 21
>> >>>>> >>>>>> *****************************************
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> _______________________________________________
>> >>>>> >>>>>> Analytics mailing list
>> >>>>> >>>>>> [email protected]
>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> --
>> >>>>> >>>>> Oliver Keyes
>> >>>>> >>>>> Research Analyst
>> >>>>> >>>>> Wikimedia Foundation
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>> --
>> >>>>> >>>> Oliver Keyes
>> >>>>> >>>> Research Analyst
>> >>>>> >>>> Wikimedia Foundation
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>> ------------------------------
>> >>>>> >>>>
>> >>>>> >>>> _______________________________________________
>> >>>>> >>>> Analytics mailing list
>> >>>>> >>>> [email protected]
>> >>>>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>> _______________________________________________
>> >>>>> >>> Analytics mailing list
>> >>>>> >>> [email protected]
>> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> --
>> >>>>> >> Oliver Keyes
>> >>>>> >> Research Analyst
>> >>>>> >> Wikimedia Foundation
>> >>>>> >>
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> ------------------------------
>> >>>>> >>
>> >>>>> >> _______________________________________________
>> >>>>> >> Analytics mailing list
>> >>>>> >> [email protected]
>> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> End of Analytics Digest, Vol 38, Issue 24
>> >>>>> >> *****************************************
>> >>>>> >
>> >>>>> >
>> >>>>> > _______________________________________________
>> >>>>> > Analytics mailing list
>> >>>>> > [email protected]
>> >>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Oliver Keyes
>> >>>>> Research Analyst
>> >>>>> Wikimedia Foundation
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Dario Taraborelli
>> >>>> Senior Research Scientist, Research and Data Lead
>> >>>> Wikimedia Foundation
>> >>>> http://wikimediafoundation.org
>> >>>> http://nitens.org/taraborelli
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Dario Taraborelli
>> >>> Senior Research Scientist, Research and Data Lead
>> >>> Wikimedia Foundation
>> >>> http://wikimediafoundation.org
>> >>> http://nitens.org/taraborelli
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
> --
> Dario Taraborelli
> Senior Research Scientist, Research and Data Lead
> Wikimedia Foundation
> http://wikimediafoundation.org
> http://nitens.org/taraborelli
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics