And thanks for doing all of this for us! We do greatly appreciate it!

Cheers,
Bharath
ᐧ

On Wed, Apr 15, 2015 at 10:40 AM, Bharath Sitaraman <[email protected]
> wrote:

> Interested in an Erlang book? :P Pretty sure I have one of those laying
> around here...
>
> Cheers,
> Bharath
> ᐧ
>
> On Wed, Apr 15, 2015 at 10:38 AM, Oliver Keyes <[email protected]>
> wrote:
>
>> I accept payment in books, pull requests and speaking invitations ;p.
>>
>> (Updated check-the-minimum query running now!)
>>
>> On 15 April 2015 at 13:35, Hirav Gandhi <[email protected]> wrote:
>> > Sorry Oliver. Let me know where I can send the beer/coffee money to
>> > compensate you for the hard work :)
>> >
>> >
>> >
>> > On Wed, Apr 15, 2015 at 10:34 AM, Oliver Keyes <[email protected]>
>> wrote:
>> >>
>> >> /This/ you say 2.5 seconds after I've launched the query ;p. Yes, it
>> >> is possible, but I'll have to recalculate the likely minimum and check
>> >> that it's still okay.
>> >>
>> >> On 15 April 2015 at 13:32, Hirav Gandhi <[email protected]>
>> wrote:
>> >> > Hi Dario,
>> >> >
>> >> > One last question - would it be possible to break it out into mobile
>> vs
>> >> > desktop? We are also concerned there might be seasonality effects in
>> >> > there
>> >> > as well. Please let us know.
>> >> >
>> >> > Best,
>> >> >
>> >> > Hirav
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Apr 15, 2015 at 10:27 AM, Dario Taraborelli
>> >> > <[email protected]> wrote:
>> >> >>
>> >> >> thanks, both. Let's go ahead with English only and no spiders
>> filtered
>> >> >> or
>> >> >> mobile/desktop breakdown, per Oliver.
>> >> >>
>> >> >> Michelle – given the aggregation level I am fine moving forward with
>> >> >> this
>> >> >> release, but let me know off-thread if you have any questions.
>> >> >>
>> >> >> Dario
>> >> >>
>> >> >> On Wed, Apr 15, 2015 at 9:53 AM, Oliver Keyes <[email protected]
>> >
>> >> >> wrote:
>> >> >>>
>> >> >>> Dario,
>> >> >>>
>> >> >>> No spider filtering, and no split between mobile and desktop;
>> mobile
>> >> >>> and desktop are grouped.
>> >> >>>
>> >> >>> On 15 April 2015 at 12:46, Hirav Gandhi <[email protected]>
>> >> >>> wrote:
>> >> >>> > e.g. German*
>> >> >>> >
>> >> >>> > I need more coffee.
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi
>> >> >>> > <[email protected]>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> Dario - we just want a representative samples of traffic for a
>> >> >>> >> popular
>> >> >>> >> site like Wikipedia. We thought limiting to the English
>> Wikipedia
>> >> >>> >> would be
>> >> >>> >> easier.
>> >> >>> >>
>> >> >>> >> If we get aggregated data across all language Wikipedia sites,
>> we
>> >> >>> >> would
>> >> >>> >> need someway to tease out which language is being queried when.
>> >> >>> >> Some
>> >> >>> >> languages (for e.g. German) we would hypothesize would have more
>> >> >>> >> daily
>> >> >>> >> seasonality than languages like English.
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli
>> >> >>> >> <[email protected]> wrote:
>> >> >>> >>>
>> >> >>> >>> Hirav, Bharath – I also want to hear from you if there's a
>> >> >>> >>> specific
>> >> >>> >>> reason to ask for English Wikipedia only or if a dataset
>> >> >>> >>> encompassing
>> >> >>> >>> aggregate pageviews across all Wikimedia properties would do
>> the
>> >> >>> >>> job.
>> >> >>> >>>
>> >> >>> >>> Dario
>> >> >>> >>>
>> >> >>> >>> On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli
>> >> >>> >>> <[email protected]> wrote:
>> >> >>> >>>>
>> >> >>> >>>> Oliver -- thanks for running a preliminary check, I'm fine
>> >> >>> >>>> releasing
>> >> >>> >>>> this data in aggregate under CC0, I believe it would be
>> valuable
>> >> >>> >>>> for
>> >> >>> >>>> this
>> >> >>> >>>> and other research projects (copying Michelle from Legal).
>> >> >>> >>>>
>> >> >>> >>>> Before we do so, though, I want to confirm the specs:
>> aggregate
>> >> >>> >>>> pageviews per second to English Wikipedia, excluding bot
>> traffic,
>> >> >>> >>>> broken
>> >> >>> >>>> down by access method (mobile web vs desktop site, not apps)
>> for
>> >> >>> >>>> a
>> >> >>> >>>> 60-day
>> >> >>> >>>> period. Oliver – are these the filters you used to identify
>> the
>> >> >>> >>>> data
>> >> >>> >>>> point
>> >> >>> >>>> with the smallest number of observations?
>> >> >>> >>>>
>> >> >>> >>>> Obviously, we will need to take into account this release
>> when we
>> >> >>> >>>> start
>> >> >>> >>>> working on projects such as
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>>
>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits
>> >> >>> >>>> and
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>>
>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews
>> >> >>> >>>>
>> >> >>> >>>> Dario
>> >> >>> >>>>
>> >> >>> >>>> On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes
>> >> >>> >>>> <[email protected]>
>> >> >>> >>>> wrote:
>> >> >>> >>>>>
>> >> >>> >>>>> Bumping for Dario, per Pine's excellent example :)
>> >> >>> >>>>>
>> >> >>> >>>>> On 13 April 2015 at 22:18, Hirav Gandhi <
>> [email protected]>
>> >> >>> >>>>> wrote:
>> >> >>> >>>>> > Oliver: Two months is fine. Thank you so much for your
>> help!
>> >> >>> >>>>> >
>> >> >>> >>>>> >> On Apr 13, 2015, at 4:40 PM,
>> >> >>> >>>>> >> [email protected]
>> >> >>> >>>>> >> wrote:
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> Send Analytics mailing list submissions to
>> >> >>> >>>>> >> [email protected]
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> To subscribe or unsubscribe via the World Wide Web, visit
>> >> >>> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >> or, via email, send a message with subject or body 'help'
>> to
>> >> >>> >>>>> >> [email protected]
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> You can reach the person managing the list at
>> >> >>> >>>>> >> [email protected]
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> When replying, please edit your Subject line so it is more
>> >> >>> >>>>> >> specific
>> >> >>> >>>>> >> than "Re: Contents of Analytics digest..."
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> Today's Topics:
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> 1. Re: Page views on a more frequent than hourly basis
>> (Pine
>> >> >>> >>>>> >> W)
>> >> >>> >>>>> >> 2. Re: Page views on a more frequent than hourly basis
>> (Hirav
>> >> >>> >>>>> >> Gandhi)
>> >> >>> >>>>> >> 3. Re: Page views on a more frequent than hourly basis
>> >> >>> >>>>> >> (Oliver
>> >> >>> >>>>> >> Keyes)
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> ----------------------------------------------------------------------
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> Message: 1
>> >> >>> >>>>> >> Date: Mon, 13 Apr 2015 13:34:23 -0700
>> >> >>> >>>>> >> From: Pine W <[email protected]>
>> >> >>> >>>>> >> To: "A mailing list for the Analytics Team at WMF and
>> >> >>> >>>>> >> everybody
>> >> >>> >>>>> >> who
>> >> >>> >>>>> >> has an interest in Wikipedia and analytics."
>> >> >>> >>>>> >> <[email protected]>
>> >> >>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent
>> than
>> >> >>> >>>>> >> hourly
>> >> >>> >>>>> >> basis
>> >> >>> >>>>> >> Message-ID:
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> <CAF=
>> [email protected]>
>> >> >>> >>>>> >> Content-Type: text/plain; charset="utf-8"
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> Hi Oliver, re ccing people who are on list, this is the
>> >> >>> >>>>> >> protocol
>> >> >>> >>>>> >> we
>> >> >>> >>>>> >> followed in IEGCom to ping people who are subscribed and
>> >> >>> >>>>> >> mentioned
>> >> >>> >>>>> >> in
>> >> >>> >>>>> >> certain emails but, like many of us, may automatically
>> move
>> >> >>> >>>>> >> emails
>> >> >>> >>>>> >> from
>> >> >>> >>>>> >> lists directly to folders where they may be unread for
>> days.
>> >> >>> >>>>> >> So
>> >> >>> >>>>> >> there is a
>> >> >>> >>>>> >> reason to do this.
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> Thanks,
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> Pine
>> >> >>> >>>>> >> -------------- next part --------------
>> >> >>> >>>>> >> An HTML attachment was scrubbed...
>> >> >>> >>>>> >> URL:
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> <
>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html
>> >
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> ------------------------------
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> Message: 2
>> >> >>> >>>>> >> Date: Mon, 13 Apr 2015 16:30:43 -0700
>> >> >>> >>>>> >> From: Hirav Gandhi <[email protected]>
>> >> >>> >>>>> >> To: [email protected]
>> >> >>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent
>> than
>> >> >>> >>>>> >> hourly
>> >> >>> >>>>> >> basis
>> >> >>> >>>>> >> Message-ID:
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> <CANzC_EOvi4MP7G_SsxvW=
>> [email protected]>
>> >> >>> >>>>> >> Content-Type: text/plain; charset="utf-8"
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> Thanks Oliver!
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> We would like this data for as broad of a time period as
>> you
>> >> >>> >>>>> >> can
>> >> >>> >>>>> >> muster.
>> >> >>> >>>>> >> The more days, months and year represented in the dataset,
>> >> >>> >>>>> >> the
>> >> >>> >>>>> >> better.
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>> Okay, so:
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>> I took an hour from the pageviews logs,[0] and aggregated
>> >> >>> >>>>> >>> pageviews
>> >> >>> >>>>> >>> to
>> >> >>> >>>>> >>> enwiki (mobile and desktop both) by timestamp, down to
>> >> >>> >>>>> >>> one-second
>> >> >>> >>>>> >>> resolution levels. The lowest number of pageviews to
>> enwiki
>> >> >>> >>>>> >>> per
>> >> >>> >>>>> >>> second
>> >> >>> >>>>> >>> was 2,981
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>> So, I don't personally have a problem with generating a
>> >> >>> >>>>> >>> release
>> >> >>> >>>>> >>> of:
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>> 1. Pageviews per second;
>> >> >>> >>>>> >>> 2. To enwiki;
>> >> >>> >>>>> >>> 3. Over $TIME_PERIOD;
>> >> >>> >>>>> >>> 4. grouping the mobile and desktop site
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>> But Dario or someone should chip in before I touch
>> anything
>> >> >>> >>>>> >>> ;p
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>> 6am yesterday. 6am because it should be low-traffic,
>> right?
>> >> >>> >>>>> >>> At
>> >> >>> >>>>> >>> least
>> >> >>> >>>>> >>> given our biases towards north america and europe
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>> On 13 April 2015 at 11:54, Oliver Keyes
>> >> >>> >>>>> >>> <[email protected]>
>> >> >>> >>>>> >>> wrote:
>> >> >>> >>>>> >>>> Then that sounds much more viable. I'll run a quick test
>> >> >>> >>>>> >>>> now
>> >> >>> >>>>> >>>> to
>> >> >>> >>>>> >>>> see
>> >> >>> >>>>> >>>> how much clustering we'd see at, say, the one-second
>> >> >>> >>>>> >>>> resolution
>> >> >>> >>>>> >>>> level,
>> >> >>> >>>>> >>>> and throw it out here so we can make more informed
>> >> >>> >>>>> >>>> decisions
>> >> >>> >>>>> >>>> about
>> >> >>> >>>>> >>>> a
>> >> >>> >>>>> >>>> data release on this.
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> On 13 April 2015 at 08:08, Hirav Gandhi
>> >> >>> >>>>> >>>> <[email protected]>
>> >> >>> >>>>> >>>> wrote:
>> >> >>> >>>>> >>>>> Hi Oliver,
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Re: Hirav: would you be looking for temporally /and/
>> >> >>> >>>>> >>>>> contextually
>> >> >>> >>>>> >>> granular
>> >> >>> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >> >>> >>>>> >>>>> temporally
>> >> >>> >>>>> >>> granular,
>> >> >>> >>>>> >>>>> so "a view to a page on enwiki at X time"? If the
>> latter
>> >> >>> >>>>> >>>>> you've
>> >> >>> >>>>> >>>>> got
>> >> >>> >>>>> >>> more of
>> >> >>> >>>>> >>>>> a shot, I suspect.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> I only want the latter - I am not concerned with the
>> >> >>> >>>>> >>>>> context
>> >> >>> >>>>> >>>>> so
>> >> >>> >>>>> >>>>> much as
>> >> >>> >>>>> >>> just
>> >> >>> >>>>> >>>>> “a view to a page on enwiki at X time.”
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Hirav
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> On Apr 13, 2015, at 5:00 AM,
>> >> >>> >>>>> >>>>> [email protected]
>> >> >>> >>>>> >>> wrote:
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Send Analytics mailing list submissions to
>> >> >>> >>>>> >>>>> [email protected]
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> To subscribe or unsubscribe via the World Wide Web,
>> visit
>> >> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>> or, via email, send a message with subject or body
>> 'help'
>> >> >>> >>>>> >>>>> to
>> >> >>> >>>>> >>>>> [email protected]
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> You can reach the person managing the list at
>> >> >>> >>>>> >>>>> [email protected]
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> When replying, please edit your Subject line so it is
>> more
>> >> >>> >>>>> >>>>> specific
>> >> >>> >>>>> >>>>> than "Re: Contents of Analytics digest..."
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Today's Topics:
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> 1. Re: Page views on a more frequent than hourly basis
>> >> >>> >>>>> >>>>> (Pine
>> >> >>> >>>>> >>>>> W)
>> >> >>> >>>>> >>>>> 2. Re: Page views on a more frequent than hourly basis
>> >> >>> >>>>> >>>>> (Oliver
>> >> >>> >>>>> >>>>> Keyes)
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> ----------------------------------------------------------------------
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Message: 1
>> >> >>> >>>>> >>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>> >> >>> >>>>> >>>>> From: Pine W <[email protected]>
>> >> >>> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and
>> >> >>> >>>>> >>>>> everybody
>> >> >>> >>>>> >>>>> who
>> >> >>> >>>>> >>>>> has an interest in Wikipedia and analytics."
>> >> >>> >>>>> >>>>> <[email protected]>
>> >> >>> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]>
>> >> >>> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent
>> >> >>> >>>>> >>>>> than
>> >> >>> >>>>> >>>>> hourly
>> >> >>> >>>>> >>>>> basis
>> >> >>> >>>>> >>>>> Message-ID:
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> <CAF=
>> [email protected]>
>> >> >>> >>>>> >>>>> Content-Type: text/plain; charset="utf-8"
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Hi,
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> This issue of pageview data granularity has been
>> discussed
>> >> >>> >>>>> >>>>> before, and
>> >> >>> >>>>> >>> the
>> >> >>> >>>>> >>>>> answer has been that hourly is the smallest increment
>> >> >>> >>>>> >>>>> allowed
>> >> >>> >>>>> >>>>> to
>> >> >>> >>>>> >>>>> be
>> >> >>> >>>>> >>>>> revealed publicly, for privacy reasons.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> I believe that the person you will want to discuss your
>> >> >>> >>>>> >>>>> request
>> >> >>> >>>>> >>>>> with is
>> >> >>> >>>>> >>>>> Toby, who I have cc'd here.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Pine
>> >> >>> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >> >>> >>>>> >>>>> <[email protected]>
>> >> >>> >>>>> >>> wrote:
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Hi Wikimedia Analytics Team,
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> My colleague Bharath and I are doing research on
>> dynamic
>> >> >>> >>>>> >>>>> server
>> >> >>> >>>>> >>> allocation
>> >> >>> >>>>> >>>>> algorithms and we were looking for a suitable datasets
>> to
>> >> >>> >>>>> >>>>> test
>> >> >>> >>>>> >>>>> our
>> >> >>> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has
>> an
>> >> >>> >>>>> >>>>> amazing
>> >> >>> >>>>> >>>>> data
>> >> >>> >>>>> >>> set
>> >> >>> >>>>> >>>>> of hourly page views, but we were looking for
>> something a
>> >> >>> >>>>> >>>>> bit
>> >> >>> >>>>> >>>>> more
>> >> >>> >>>>> >>>>> granular, such as aggregated page requests to English
>> >> >>> >>>>> >>>>> Wikipedia
>> >> >>> >>>>> >>>>> on a
>> >> >>> >>>>> >>> minute
>> >> >>> >>>>> >>>>> by minute basis or second by second basis if possible.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> We are more than happy to pour through any raw data you
>> >> >>> >>>>> >>>>> might
>> >> >>> >>>>> >>>>> have that
>> >> >>> >>>>> >>>>> would help us calculate page requests at this granular
>> >> >>> >>>>> >>>>> level.
>> >> >>> >>>>> >>>>> Please
>> >> >>> >>>>> >>> let us
>> >> >>> >>>>> >>>>> know if it would be possible to get such data and if so
>> >> >>> >>>>> >>>>> how.
>> >> >>> >>>>> >>>>> Thank you
>> >> >>> >>>>> >>> in
>> >> >>> >>>>> >>>>> advance for your help.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Best,
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Hirav Gandhi
>> >> >>> >>>>> >>>>> _______________________________________________
>> >> >>> >>>>> >>>>> Analytics mailing list
>> >> >>> >>>>> >>>>> [email protected]
>> >> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> -------------- next part --------------
>> >> >>> >>>>> >>>>> An HTML attachment was scrubbed...
>> >> >>> >>>>> >>>>> URL:
>> >> >>> >>>>> >>>>> <
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>>
>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> ------------------------------
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Message: 2
>> >> >>> >>>>> >>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>> >> >>> >>>>> >>>>> From: Oliver Keyes <[email protected]>
>> >> >>> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and
>> >> >>> >>>>> >>>>> everybody
>> >> >>> >>>>> >>>>> who
>> >> >>> >>>>> >>>>> has an interest in Wikipedia and analytics."
>> >> >>> >>>>> >>>>> <[email protected]>
>> >> >>> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]>
>> >> >>> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent
>> >> >>> >>>>> >>>>> than
>> >> >>> >>>>> >>>>> hourly
>> >> >>> >>>>> >>>>> basis
>> >> >>> >>>>> >>>>> Message-ID:
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=
>> [email protected]>
>> >> >>> >>>>> >>>>> Content-Type: text/plain; charset=UTF-8
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Preeetty sure that Toby is on the analytics list, Pine.
>> >> >>> >>>>> >>>>> He's
>> >> >>> >>>>> >>>>> the
>> >> >>> >>>>> >>>>> director of analytics.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Hirav: would you be looking for temporally /and/
>> >> >>> >>>>> >>>>> contextually
>> >> >>> >>>>> >>>>> granular
>> >> >>> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >> >>> >>>>> >>>>> temporally
>> >> >>> >>>>> >>>>> granular, so "a view to a page on enwiki at X time"? If
>> >> >>> >>>>> >>>>> the
>> >> >>> >>>>> >>>>> latter
>> >> >>> >>>>> >>>>> you've got more of a shot, I suspect.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> On 13 April 2015 at 03:47, Pine W <[email protected]
>> >
>> >> >>> >>>>> >>>>> wrote:
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Hi,
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> This issue of pageview data granularity has been
>> discussed
>> >> >>> >>>>> >>>>> before, and
>> >> >>> >>>>> >>> the
>> >> >>> >>>>> >>>>> answer has been that hourly is the smallest increment
>> >> >>> >>>>> >>>>> allowed
>> >> >>> >>>>> >>>>> to
>> >> >>> >>>>> >>>>> be
>> >> >>> >>>>> >>> revealed
>> >> >>> >>>>> >>>>> publicly, for privacy reasons.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> I believe that the person you will want to discuss your
>> >> >>> >>>>> >>>>> request
>> >> >>> >>>>> >>>>> with is
>> >> >>> >>>>> >>>>> Toby, who I have cc'd here.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Pine
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >> >>> >>>>> >>>>> <[email protected]>
>> >> >>> >>>>> >>> wrote:
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Hi Wikimedia Analytics Team,
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> My colleague Bharath and I are doing research on
>> dynamic
>> >> >>> >>>>> >>>>> server
>> >> >>> >>>>> >>> allocation
>> >> >>> >>>>> >>>>> algorithms and we were looking for a suitable datasets
>> to
>> >> >>> >>>>> >>>>> test
>> >> >>> >>>>> >>>>> our
>> >> >>> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has
>> an
>> >> >>> >>>>> >>>>> amazing
>> >> >>> >>>>> >>>>> data
>> >> >>> >>>>> >>> set
>> >> >>> >>>>> >>>>> of hourly page views, but we were looking for
>> something a
>> >> >>> >>>>> >>>>> bit
>> >> >>> >>>>> >>>>> more
>> >> >>> >>>>> >>> granular,
>> >> >>> >>>>> >>>>> such as aggregated page requests to English Wikipedia
>> on a
>> >> >>> >>>>> >>>>> minute
>> >> >>> >>>>> >>>>> by
>> >> >>> >>>>> >>> minute
>> >> >>> >>>>> >>>>> basis or second by second basis if possible.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> We are more than happy to pour through any raw data you
>> >> >>> >>>>> >>>>> might
>> >> >>> >>>>> >>>>> have that
>> >> >>> >>>>> >>>>> would help us calculate page requests at this granular
>> >> >>> >>>>> >>>>> level.
>> >> >>> >>>>> >>>>> Please
>> >> >>> >>>>> >>> let us
>> >> >>> >>>>> >>>>> know if it would be possible to get such data and if so
>> >> >>> >>>>> >>>>> how.
>> >> >>> >>>>> >>>>> Thank you
>> >> >>> >>>>> >>> in
>> >> >>> >>>>> >>>>> advance for your help.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Best,
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> Hirav Gandhi
>> >> >>> >>>>> >>>>> _______________________________________________
>> >> >>> >>>>> >>>>> Analytics mailing list
>> >> >>> >>>>> >>>>> [email protected]
>> >> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> _______________________________________________
>> >> >>> >>>>> >>>>> Analytics mailing list
>> >> >>> >>>>> >>>>> [email protected]
>> >> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> --
>> >> >>> >>>>> >>>>> Oliver Keyes
>> >> >>> >>>>> >>>>> Research Analyst
>> >> >>> >>>>> >>>>> Wikimedia Foundation
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> ------------------------------
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> _______________________________________________
>> >> >>> >>>>> >>>>> Analytics mailing list
>> >> >>> >>>>> >>>>> [email protected]
>> >> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> End of Analytics Digest, Vol 38, Issue 21
>> >> >>> >>>>> >>>>> *****************************************
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> _______________________________________________
>> >> >>> >>>>> >>>>> Analytics mailing list
>> >> >>> >>>>> >>>>> [email protected]
>> >> >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> --
>> >> >>> >>>>> >>>> Oliver Keyes
>> >> >>> >>>>> >>>> Research Analyst
>> >> >>> >>>>> >>>> Wikimedia Foundation
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>> --
>> >> >>> >>>>> >>> Oliver Keyes
>> >> >>> >>>>> >>> Research Analyst
>> >> >>> >>>>> >>> Wikimedia Foundation
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>> ------------------------------
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>> _______________________________________________
>> >> >>> >>>>> >>> Analytics mailing list
>> >> >>> >>>>> >>> [email protected]
>> >> >>> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >> -------------- next part --------------
>> >> >>> >>>>> >> An HTML attachment was scrubbed...
>> >> >>> >>>>> >> URL:
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> <
>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html
>> >
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> ------------------------------
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> Message: 3
>> >> >>> >>>>> >> Date: Mon, 13 Apr 2015 19:40:04 -0400
>> >> >>> >>>>> >> From: Oliver Keyes <[email protected]>
>> >> >>> >>>>> >> To: "A mailing list for the Analytics Team at WMF and
>> >> >>> >>>>> >> everybody
>> >> >>> >>>>> >> who
>> >> >>> >>>>> >> has an interest in Wikipedia and analytics."
>> >> >>> >>>>> >> <[email protected]>
>> >> >>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent
>> than
>> >> >>> >>>>> >> hourly
>> >> >>> >>>>> >> basis
>> >> >>> >>>>> >> Message-ID:
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> <
>> caauqgdd6z5ussu11vw49fdmbsrhyejxku9yopyserib79j-...@mail.gmail.com>
>> >> >>> >>>>> >> Content-Type: text/plain; charset=UTF-8
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> ....
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> ...years?
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> We have unsampled logs for, ah. 2 months.
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> On 13 April 2015 at 19:30, Hirav Gandhi
>> >> >>> >>>>> >> <[email protected]>
>> >> >>> >>>>> >> wrote:
>> >> >>> >>>>> >>> Thanks Oliver!
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>> We would like this data for as broad of a time period as
>> you
>> >> >>> >>>>> >>> can
>> >> >>> >>>>> >>> muster. The
>> >> >>> >>>>> >>> more days, months and year represented in the dataset,
>> the
>> >> >>> >>>>> >>> better.
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> Okay, so:
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> I took an hour from the pageviews logs,[0] and
>> aggregated
>> >> >>> >>>>> >>>> pageviews to
>> >> >>> >>>>> >>>> enwiki (mobile and desktop both) by timestamp, down to
>> >> >>> >>>>> >>>> one-second
>> >> >>> >>>>> >>>> resolution levels. The lowest number of pageviews to
>> enwiki
>> >> >>> >>>>> >>>> per
>> >> >>> >>>>> >>>> second
>> >> >>> >>>>> >>>> was 2,981
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> So, I don't personally have a problem with generating a
>> >> >>> >>>>> >>>> release
>> >> >>> >>>>> >>>> of:
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> 1. Pageviews per second;
>> >> >>> >>>>> >>>> 2. To enwiki;
>> >> >>> >>>>> >>>> 3. Over $TIME_PERIOD;
>> >> >>> >>>>> >>>> 4. grouping the mobile and desktop site
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> But Dario or someone should chip in before I touch
>> anything
>> >> >>> >>>>> >>>> ;p
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> 6am yesterday. 6am because it should be low-traffic,
>> right?
>> >> >>> >>>>> >>>> At
>> >> >>> >>>>> >>>> least
>> >> >>> >>>>> >>>> given our biases towards north america and europe
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> On 13 April 2015 at 11:54, Oliver Keyes
>> >> >>> >>>>> >>>> <[email protected]>
>> >> >>> >>>>> >>>> wrote:
>> >> >>> >>>>> >>>>> Then that sounds much more viable. I'll run a quick
>> test
>> >> >>> >>>>> >>>>> now
>> >> >>> >>>>> >>>>> to
>> >> >>> >>>>> >>>>> see
>> >> >>> >>>>> >>>>> how much clustering we'd see at, say, the one-second
>> >> >>> >>>>> >>>>> resolution
>> >> >>> >>>>> >>>>> level,
>> >> >>> >>>>> >>>>> and throw it out here so we can make more informed
>> >> >>> >>>>> >>>>> decisions
>> >> >>> >>>>> >>>>> about a
>> >> >>> >>>>> >>>>> data release on this.
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> On 13 April 2015 at 08:08, Hirav Gandhi
>> >> >>> >>>>> >>>>> <[email protected]>
>> >> >>> >>>>> >>>>> wrote:
>> >> >>> >>>>> >>>>>> Hi Oliver,
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Re: Hirav: would you be looking for temporally /and/
>> >> >>> >>>>> >>>>>> contextually
>> >> >>> >>>>> >>>>>> granular
>> >> >>> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >> >>> >>>>> >>>>>> temporally
>> >> >>> >>>>> >>>>>> granular,
>> >> >>> >>>>> >>>>>> so "a view to a page on enwiki at X time"? If the
>> latter
>> >> >>> >>>>> >>>>>> you've
>> >> >>> >>>>> >>>>>> got
>> >> >>> >>>>> >>>>>> more of
>> >> >>> >>>>> >>>>>> a shot, I suspect.
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> I only want the latter - I am not concerned with the
>> >> >>> >>>>> >>>>>> context
>> >> >>> >>>>> >>>>>> so
>> >> >>> >>>>> >>>>>> much as
>> >> >>> >>>>> >>>>>> just
>> >> >>> >>>>> >>>>>> “a view to a page on enwiki at X time.”
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Hirav
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> On Apr 13, 2015, at 5:00 AM,
>> >> >>> >>>>> >>>>>> [email protected]
>> >> >>> >>>>> >>>>>> wrote:
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Send Analytics mailing list submissions to
>> >> >>> >>>>> >>>>>> [email protected]
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> To subscribe or unsubscribe via the World Wide Web,
>> visit
>> >> >>> >>>>> >>>>>>
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>>> or, via email, send a message with subject or body
>> 'help'
>> >> >>> >>>>> >>>>>> to
>> >> >>> >>>>> >>>>>> [email protected]
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> You can reach the person managing the list at
>> >> >>> >>>>> >>>>>> [email protected]
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> When replying, please edit your Subject line so it is
>> >> >>> >>>>> >>>>>> more
>> >> >>> >>>>> >>>>>> specific
>> >> >>> >>>>> >>>>>> than "Re: Contents of Analytics digest..."
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Today's Topics:
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> 1. Re: Page views on a more frequent than hourly basis
>> >> >>> >>>>> >>>>>> (Pine W)
>> >> >>> >>>>> >>>>>> 2. Re: Page views on a more frequent than hourly basis
>> >> >>> >>>>> >>>>>> (Oliver
>> >> >>> >>>>> >>>>>> Keyes)
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> ----------------------------------------------------------------------
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Message: 1
>> >> >>> >>>>> >>>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700
>> >> >>> >>>>> >>>>>> From: Pine W <[email protected]>
>> >> >>> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and
>> >> >>> >>>>> >>>>>> everybody
>> >> >>> >>>>> >>>>>> who
>> >> >>> >>>>> >>>>>> has an interest in Wikipedia and analytics."
>> >> >>> >>>>> >>>>>> <[email protected]>
>> >> >>> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]>
>> >> >>> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent
>> >> >>> >>>>> >>>>>> than
>> >> >>> >>>>> >>>>>> hourly
>> >> >>> >>>>> >>>>>> basis
>> >> >>> >>>>> >>>>>> Message-ID:
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> <CAF=
>> [email protected]>
>> >> >>> >>>>> >>>>>> Content-Type: text/plain; charset="utf-8"
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Hi,
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> This issue of pageview data granularity has been
>> >> >>> >>>>> >>>>>> discussed
>> >> >>> >>>>> >>>>>> before, and
>> >> >>> >>>>> >>>>>> the
>> >> >>> >>>>> >>>>>> answer has been that hourly is the smallest increment
>> >> >>> >>>>> >>>>>> allowed to
>> >> >>> >>>>> >>>>>> be
>> >> >>> >>>>> >>>>>> revealed publicly, for privacy reasons.
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> I believe that the person you will want to discuss
>> your
>> >> >>> >>>>> >>>>>> request
>> >> >>> >>>>> >>>>>> with is
>> >> >>> >>>>> >>>>>> Toby, who I have cc'd here.
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Pine
>> >> >>> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >> >>> >>>>> >>>>>> <[email protected]>
>> >> >>> >>>>> >>>>>> wrote:
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Hi Wikimedia Analytics Team,
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> My colleague Bharath and I are doing research on
>> dynamic
>> >> >>> >>>>> >>>>>> server
>> >> >>> >>>>> >>>>>> allocation
>> >> >>> >>>>> >>>>>> algorithms and we were looking for a suitable
>> datasets to
>> >> >>> >>>>> >>>>>> test
>> >> >>> >>>>> >>>>>> our
>> >> >>> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia
>> has an
>> >> >>> >>>>> >>>>>> amazing data
>> >> >>> >>>>> >>>>>> set
>> >> >>> >>>>> >>>>>> of hourly page views, but we were looking for
>> something a
>> >> >>> >>>>> >>>>>> bit
>> >> >>> >>>>> >>>>>> more
>> >> >>> >>>>> >>>>>> granular, such as aggregated page requests to English
>> >> >>> >>>>> >>>>>> Wikipedia
>> >> >>> >>>>> >>>>>> on a
>> >> >>> >>>>> >>>>>> minute
>> >> >>> >>>>> >>>>>> by minute basis or second by second basis if possible.
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> We are more than happy to pour through any raw data
>> you
>> >> >>> >>>>> >>>>>> might
>> >> >>> >>>>> >>>>>> have that
>> >> >>> >>>>> >>>>>> would help us calculate page requests at this granular
>> >> >>> >>>>> >>>>>> level.
>> >> >>> >>>>> >>>>>> Please
>> >> >>> >>>>> >>>>>> let us
>> >> >>> >>>>> >>>>>> know if it would be possible to get such data and if
>> so
>> >> >>> >>>>> >>>>>> how.
>> >> >>> >>>>> >>>>>> Thank you
>> >> >>> >>>>> >>>>>> in
>> >> >>> >>>>> >>>>>> advance for your help.
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Best,
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Hirav Gandhi
>> >> >>> >>>>> >>>>>> _______________________________________________
>> >> >>> >>>>> >>>>>> Analytics mailing list
>> >> >>> >>>>> >>>>>> [email protected]
>> >> >>> >>>>> >>>>>>
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> -------------- next part --------------
>> >> >>> >>>>> >>>>>> An HTML attachment was scrubbed...
>> >> >>> >>>>> >>>>>> URL:
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> <
>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html
>> >
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> ------------------------------
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Message: 2
>> >> >>> >>>>> >>>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400
>> >> >>> >>>>> >>>>>> From: Oliver Keyes <[email protected]>
>> >> >>> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and
>> >> >>> >>>>> >>>>>> everybody
>> >> >>> >>>>> >>>>>> who
>> >> >>> >>>>> >>>>>> has an interest in Wikipedia and analytics."
>> >> >>> >>>>> >>>>>> <[email protected]>
>> >> >>> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]>
>> >> >>> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent
>> >> >>> >>>>> >>>>>> than
>> >> >>> >>>>> >>>>>> hourly
>> >> >>> >>>>> >>>>>> basis
>> >> >>> >>>>> >>>>>> Message-ID:
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=
>> [email protected]>
>> >> >>> >>>>> >>>>>> Content-Type: text/plain; charset=UTF-8
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Preeetty sure that Toby is on the analytics list,
>> Pine.
>> >> >>> >>>>> >>>>>> He's
>> >> >>> >>>>> >>>>>> the
>> >> >>> >>>>> >>>>>> director of analytics.
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Hirav: would you be looking for temporally /and/
>> >> >>> >>>>> >>>>>> contextually
>> >> >>> >>>>> >>>>>> granular
>> >> >>> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just
>> >> >>> >>>>> >>>>>> temporally
>> >> >>> >>>>> >>>>>> granular, so "a view to a page on enwiki at X time"?
>> If
>> >> >>> >>>>> >>>>>> the
>> >> >>> >>>>> >>>>>> latter
>> >> >>> >>>>> >>>>>> you've got more of a shot, I suspect.
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> On 13 April 2015 at 03:47, Pine W <
>> [email protected]>
>> >> >>> >>>>> >>>>>> wrote:
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Hi,
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> This issue of pageview data granularity has been
>> >> >>> >>>>> >>>>>> discussed
>> >> >>> >>>>> >>>>>> before, and
>> >> >>> >>>>> >>>>>> the
>> >> >>> >>>>> >>>>>> answer has been that hourly is the smallest increment
>> >> >>> >>>>> >>>>>> allowed to
>> >> >>> >>>>> >>>>>> be
>> >> >>> >>>>> >>>>>> revealed
>> >> >>> >>>>> >>>>>> publicly, for privacy reasons.
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> I believe that the person you will want to discuss
>> your
>> >> >>> >>>>> >>>>>> request
>> >> >>> >>>>> >>>>>> with is
>> >> >>> >>>>> >>>>>> Toby, who I have cc'd here.
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Pine
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi"
>> >> >>> >>>>> >>>>>> <[email protected]>
>> >> >>> >>>>> >>>>>> wrote:
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Hi Wikimedia Analytics Team,
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> My colleague Bharath and I are doing research on
>> dynamic
>> >> >>> >>>>> >>>>>> server
>> >> >>> >>>>> >>>>>> allocation
>> >> >>> >>>>> >>>>>> algorithms and we were looking for a suitable
>> datasets to
>> >> >>> >>>>> >>>>>> test
>> >> >>> >>>>> >>>>>> our
>> >> >>> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia
>> has an
>> >> >>> >>>>> >>>>>> amazing data
>> >> >>> >>>>> >>>>>> set
>> >> >>> >>>>> >>>>>> of hourly page views, but we were looking for
>> something a
>> >> >>> >>>>> >>>>>> bit
>> >> >>> >>>>> >>>>>> more
>> >> >>> >>>>> >>>>>> granular,
>> >> >>> >>>>> >>>>>> such as aggregated page requests to English Wikipedia
>> on
>> >> >>> >>>>> >>>>>> a
>> >> >>> >>>>> >>>>>> minute by
>> >> >>> >>>>> >>>>>> minute
>> >> >>> >>>>> >>>>>> basis or second by second basis if possible.
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> We are more than happy to pour through any raw data
>> you
>> >> >>> >>>>> >>>>>> might
>> >> >>> >>>>> >>>>>> have that
>> >> >>> >>>>> >>>>>> would help us calculate page requests at this granular
>> >> >>> >>>>> >>>>>> level.
>> >> >>> >>>>> >>>>>> Please
>> >> >>> >>>>> >>>>>> let us
>> >> >>> >>>>> >>>>>> know if it would be possible to get such data and if
>> so
>> >> >>> >>>>> >>>>>> how.
>> >> >>> >>>>> >>>>>> Thank you
>> >> >>> >>>>> >>>>>> in
>> >> >>> >>>>> >>>>>> advance for your help.
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Best,
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> Hirav Gandhi
>> >> >>> >>>>> >>>>>> _______________________________________________
>> >> >>> >>>>> >>>>>> Analytics mailing list
>> >> >>> >>>>> >>>>>> [email protected]
>> >> >>> >>>>> >>>>>>
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> _______________________________________________
>> >> >>> >>>>> >>>>>> Analytics mailing list
>> >> >>> >>>>> >>>>>> [email protected]
>> >> >>> >>>>> >>>>>>
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> --
>> >> >>> >>>>> >>>>>> Oliver Keyes
>> >> >>> >>>>> >>>>>> Research Analyst
>> >> >>> >>>>> >>>>>> Wikimedia Foundation
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> ------------------------------
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> _______________________________________________
>> >> >>> >>>>> >>>>>> Analytics mailing list
>> >> >>> >>>>> >>>>>> [email protected]
>> >> >>> >>>>> >>>>>>
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> End of Analytics Digest, Vol 38, Issue 21
>> >> >>> >>>>> >>>>>> *****************************************
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>> _______________________________________________
>> >> >>> >>>>> >>>>>> Analytics mailing list
>> >> >>> >>>>> >>>>>> [email protected]
>> >> >>> >>>>> >>>>>>
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>>
>> >> >>> >>>>> >>>>> --
>> >> >>> >>>>> >>>>> Oliver Keyes
>> >> >>> >>>>> >>>>> Research Analyst
>> >> >>> >>>>> >>>>> Wikimedia Foundation
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> --
>> >> >>> >>>>> >>>> Oliver Keyes
>> >> >>> >>>>> >>>> Research Analyst
>> >> >>> >>>>> >>>> Wikimedia Foundation
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> ------------------------------
>> >> >>> >>>>> >>>>
>> >> >>> >>>>> >>>> _______________________________________________
>> >> >>> >>>>> >>>> Analytics mailing list
>> >> >>> >>>>> >>>> [email protected]
>> >> >>> >>>>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>> _______________________________________________
>> >> >>> >>>>> >>> Analytics mailing list
>> >> >>> >>>>> >>> [email protected]
>> >> >>> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> --
>> >> >>> >>>>> >> Oliver Keyes
>> >> >>> >>>>> >> Research Analyst
>> >> >>> >>>>> >> Wikimedia Foundation
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> ------------------------------
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> _______________________________________________
>> >> >>> >>>>> >> Analytics mailing list
>> >> >>> >>>>> >> [email protected]
>> >> >>> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>> >>
>> >> >>> >>>>> >>
>> >> >>> >>>>> >> End of Analytics Digest, Vol 38, Issue 24
>> >> >>> >>>>> >> *****************************************
>> >> >>> >>>>> >
>> >> >>> >>>>> >
>> >> >>> >>>>> > _______________________________________________
>> >> >>> >>>>> > Analytics mailing list
>> >> >>> >>>>> > [email protected]
>> >> >>> >>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>> >>>>>
>> >> >>> >>>>>
>> >> >>> >>>>>
>> >> >>> >>>>> --
>> >> >>> >>>>> Oliver Keyes
>> >> >>> >>>>> Research Analyst
>> >> >>> >>>>> Wikimedia Foundation
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>>
>> >> >>> >>>> --
>> >> >>> >>>> Dario Taraborelli
>> >> >>> >>>> Senior Research Scientist, Research and Data Lead
>> >> >>> >>>> Wikimedia Foundation
>> >> >>> >>>> http://wikimediafoundation.org
>> >> >>> >>>> http://nitens.org/taraborelli
>> >> >>> >>>
>> >> >>> >>>
>> >> >>> >>>
>> >> >>> >>>
>> >> >>> >>> --
>> >> >>> >>> Dario Taraborelli
>> >> >>> >>> Senior Research Scientist, Research and Data Lead
>> >> >>> >>> Wikimedia Foundation
>> >> >>> >>> http://wikimediafoundation.org
>> >> >>> >>> http://nitens.org/taraborelli
>> >> >>> >>
>> >> >>> >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Oliver Keyes
>> >> >>> Research Analyst
>> >> >>> Wikimedia Foundation
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> Analytics mailing list
>> >> >>> [email protected]
>> >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Dario Taraborelli
>> >> >> Senior Research Scientist, Research and Data Lead
>> >> >> Wikimedia Foundation
>> >> >> http://wikimediafoundation.org
>> >> >> http://nitens.org/taraborelli
>> >> >
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Analytics mailing list
>> >> > [email protected]
>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Oliver Keyes
>> >> Research Analyst
>> >> Wikimedia Foundation
>> >>
>> >> _______________________________________________
>> >> Analytics mailing list
>> >> [email protected]
>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>
>
>
> --
> Bharath Sitaraman
> [email protected]
>



-- 
Bharath Sitaraman
[email protected]
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to