Dario, No spider filtering, and no split between mobile and desktop; mobile and desktop are grouped.
On 15 April 2015 at 12:46, Hirav Gandhi <[email protected]> wrote: > e.g. German* > > I need more coffee. > > > > On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi <[email protected]> > wrote: >> >> Dario - we just want a representative samples of traffic for a popular >> site like Wikipedia. We thought limiting to the English Wikipedia would be >> easier. >> >> If we get aggregated data across all language Wikipedia sites, we would >> need someway to tease out which language is being queried when. Some >> languages (for e.g. German) we would hypothesize would have more daily >> seasonality than languages like English. >> >> >> >> On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli >> <[email protected]> wrote: >>> >>> Hirav, Bharath – I also want to hear from you if there's a specific >>> reason to ask for English Wikipedia only or if a dataset encompassing >>> aggregate pageviews across all Wikimedia properties would do the job. >>> >>> Dario >>> >>> On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli >>> <[email protected]> wrote: >>>> >>>> Oliver -- thanks for running a preliminary check, I'm fine releasing >>>> this data in aggregate under CC0, I believe it would be valuable for this >>>> and other research projects (copying Michelle from Legal). >>>> >>>> Before we do so, though, I want to confirm the specs: aggregate >>>> pageviews per second to English Wikipedia, excluding bot traffic, broken >>>> down by access method (mobile web vs desktop site, not apps) for a 60-day >>>> period. Oliver – are these the filters you used to identify the data point >>>> with the smallest number of observations? >>>> >>>> Obviously, we will need to take into account this release when we start >>>> working on projects such as >>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits >>>> and >>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews >>>> >>>> Dario >>>> >>>> On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes <[email protected]> >>>> wrote: >>>>> >>>>> Bumping for Dario, per Pine's excellent example :) >>>>> >>>>> On 13 April 2015 at 22:18, Hirav Gandhi <[email protected]> wrote: >>>>> > Oliver: Two months is fine. Thank you so much for your help! >>>>> > >>>>> >> On Apr 13, 2015, at 4:40 PM, [email protected] >>>>> >> wrote: >>>>> >> >>>>> >> Send Analytics mailing list submissions to >>>>> >> [email protected] >>>>> >> >>>>> >> To subscribe or unsubscribe via the World Wide Web, visit >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >> or, via email, send a message with subject or body 'help' to >>>>> >> [email protected] >>>>> >> >>>>> >> You can reach the person managing the list at >>>>> >> [email protected] >>>>> >> >>>>> >> When replying, please edit your Subject line so it is more specific >>>>> >> than "Re: Contents of Analytics digest..." >>>>> >> >>>>> >> >>>>> >> Today's Topics: >>>>> >> >>>>> >> 1. Re: Page views on a more frequent than hourly basis (Pine W) >>>>> >> 2. Re: Page views on a more frequent than hourly basis (Hirav >>>>> >> Gandhi) >>>>> >> 3. Re: Page views on a more frequent than hourly basis (Oliver >>>>> >> Keyes) >>>>> >> >>>>> >> >>>>> >> >>>>> >> ---------------------------------------------------------------------- >>>>> >> >>>>> >> Message: 1 >>>>> >> Date: Mon, 13 Apr 2015 13:34:23 -0700 >>>>> >> From: Pine W <[email protected]> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody who >>>>> >> has an interest in Wikipedia and analytics." >>>>> >> <[email protected]> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly >>>>> >> basis >>>>> >> Message-ID: >>>>> >> >>>>> >> <CAF=dyjjzmdfthz+0+lwnhb9m8xuod4wetgcfuxyb9qyf7cy...@mail.gmail.com> >>>>> >> Content-Type: text/plain; charset="utf-8" >>>>> >> >>>>> >> Hi Oliver, re ccing people who are on list, this is the protocol we >>>>> >> followed in IEGCom to ping people who are subscribed and mentioned >>>>> >> in >>>>> >> certain emails but, like many of us, may automatically move emails >>>>> >> from >>>>> >> lists directly to folders where they may be unread for days. So >>>>> >> there is a >>>>> >> reason to do this. >>>>> >> >>>>> >> Thanks, >>>>> >> >>>>> >> Pine >>>>> >> -------------- next part -------------- >>>>> >> An HTML attachment was scrubbed... >>>>> >> URL: >>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html> >>>>> >> >>>>> >> ------------------------------ >>>>> >> >>>>> >> Message: 2 >>>>> >> Date: Mon, 13 Apr 2015 16:30:43 -0700 >>>>> >> From: Hirav Gandhi <[email protected]> >>>>> >> To: [email protected] >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly >>>>> >> basis >>>>> >> Message-ID: >>>>> >> >>>>> >> <CANzC_EOvi4MP7G_SsxvW=uojpt2vxbnfmhcipqn1pumace-...@mail.gmail.com> >>>>> >> Content-Type: text/plain; charset="utf-8" >>>>> >> >>>>> >> Thanks Oliver! >>>>> >> >>>>> >> We would like this data for as broad of a time period as you can >>>>> >> muster. >>>>> >> The more days, months and year represented in the dataset, the >>>>> >> better. >>>>> >> >>>>> >> >>>>> >>> Okay, so: >>>>> >>> >>>>> >>> I took an hour from the pageviews logs,[0] and aggregated pageviews >>>>> >>> to >>>>> >>> enwiki (mobile and desktop both) by timestamp, down to one-second >>>>> >>> resolution levels. The lowest number of pageviews to enwiki per >>>>> >>> second >>>>> >>> was 2,981 >>>>> >>> >>>>> >>> So, I don't personally have a problem with generating a release of: >>>>> >>> >>>>> >>> 1. Pageviews per second; >>>>> >>> 2. To enwiki; >>>>> >>> 3. Over $TIME_PERIOD; >>>>> >>> 4. grouping the mobile and desktop site >>>>> >>> >>>>> >>> But Dario or someone should chip in before I touch anything ;p >>>>> >>> >>>>> >>> 6am yesterday. 6am because it should be low-traffic, right? At >>>>> >>> least >>>>> >>> given our biases towards north america and europe >>>>> >>> >>>>> >>> On 13 April 2015 at 11:54, Oliver Keyes <[email protected]> >>>>> >>> wrote: >>>>> >>>> Then that sounds much more viable. I'll run a quick test now to >>>>> >>>> see >>>>> >>>> how much clustering we'd see at, say, the one-second resolution >>>>> >>>> level, >>>>> >>>> and throw it out here so we can make more informed decisions about >>>>> >>>> a >>>>> >>>> data release on this. >>>>> >>>> >>>>> >>>> On 13 April 2015 at 08:08, Hirav Gandhi <[email protected]> >>>>> >>>> wrote: >>>>> >>>>> Hi Oliver, >>>>> >>>>> >>>>> >>>>> Re: Hirav: would you be looking for temporally /and/ contextually >>>>> >>> granular >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just temporally >>>>> >>> granular, >>>>> >>>>> so "a view to a page on enwiki at X time"? If the latter you've >>>>> >>>>> got >>>>> >>> more of >>>>> >>>>> a shot, I suspect. >>>>> >>>>> >>>>> >>>>> I only want the latter - I am not concerned with the context so >>>>> >>>>> much as >>>>> >>> just >>>>> >>>>> “a view to a page on enwiki at X time.” >>>>> >>>>> >>>>> >>>>> Hirav >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Apr 13, 2015, at 5:00 AM, >>>>> >>>>> [email protected] >>>>> >>> wrote: >>>>> >>>>> >>>>> >>>>> Send Analytics mailing list submissions to >>>>> >>>>> [email protected] >>>>> >>>>> >>>>> >>>>> To subscribe or unsubscribe via the World Wide Web, visit >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> or, via email, send a message with subject or body 'help' to >>>>> >>>>> [email protected] >>>>> >>>>> >>>>> >>>>> You can reach the person managing the list at >>>>> >>>>> [email protected] >>>>> >>>>> >>>>> >>>>> When replying, please edit your Subject line so it is more >>>>> >>>>> specific >>>>> >>>>> than "Re: Contents of Analytics digest..." >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Today's Topics: >>>>> >>>>> >>>>> >>>>> 1. Re: Page views on a more frequent than hourly basis (Pine W) >>>>> >>>>> 2. Re: Page views on a more frequent than hourly basis (Oliver >>>>> >>>>> Keyes) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ---------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> Message: 1 >>>>> >>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700 >>>>> >>>>> From: Pine W <[email protected]> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody >>>>> >>>>> who >>>>> >>>>> has an interest in Wikipedia and analytics." >>>>> >>>>> <[email protected]> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than >>>>> >>>>> hourly >>>>> >>>>> basis >>>>> >>>>> Message-ID: >>>>> >>>>> >>>>> >>>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com> >>>>> >>>>> Content-Type: text/plain; charset="utf-8" >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> This issue of pageview data granularity has been discussed >>>>> >>>>> before, and >>>>> >>> the >>>>> >>>>> answer has been that hourly is the smallest increment allowed to >>>>> >>>>> be >>>>> >>>>> revealed publicly, for privacy reasons. >>>>> >>>>> >>>>> >>>>> I believe that the person you will want to discuss your request >>>>> >>>>> with is >>>>> >>>>> Toby, who I have cc'd here. >>>>> >>>>> >>>>> >>>>> Pine >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <[email protected]> >>>>> >>> wrote: >>>>> >>>>> >>>>> >>>>> Hi Wikimedia Analytics Team, >>>>> >>>>> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic server >>>>> >>> allocation >>>>> >>>>> algorithms and we were looking for a suitable datasets to test >>>>> >>>>> our >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an amazing >>>>> >>>>> data >>>>> >>> set >>>>> >>>>> of hourly page views, but we were looking for something a bit >>>>> >>>>> more >>>>> >>>>> granular, such as aggregated page requests to English Wikipedia >>>>> >>>>> on a >>>>> >>> minute >>>>> >>>>> by minute basis or second by second basis if possible. >>>>> >>>>> >>>>> >>>>> We are more than happy to pour through any raw data you might >>>>> >>>>> have that >>>>> >>>>> would help us calculate page requests at this granular level. >>>>> >>>>> Please >>>>> >>> let us >>>>> >>>>> know if it would be possible to get such data and if so how. >>>>> >>>>> Thank you >>>>> >>> in >>>>> >>>>> advance for your help. >>>>> >>>>> >>>>> >>>>> Best, >>>>> >>>>> >>>>> >>>>> Hirav Gandhi >>>>> >>>>> _______________________________________________ >>>>> >>>>> Analytics mailing list >>>>> >>>>> [email protected] >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>>> >>>>> -------------- next part -------------- >>>>> >>>>> An HTML attachment was scrubbed... >>>>> >>>>> URL: >>>>> >>>>> < >>>>> >>> >>>>> >>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html >>>>> >>>> >>>>> >>>>> >>>>> >>>>> ------------------------------ >>>>> >>>>> >>>>> >>>>> Message: 2 >>>>> >>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400 >>>>> >>>>> From: Oliver Keyes <[email protected]> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and everybody >>>>> >>>>> who >>>>> >>>>> has an interest in Wikipedia and analytics." >>>>> >>>>> <[email protected]> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than >>>>> >>>>> hourly >>>>> >>>>> basis >>>>> >>>>> Message-ID: >>>>> >>>>> >>>>> >>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com> >>>>> >>>>> Content-Type: text/plain; charset=UTF-8 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Preeetty sure that Toby is on the analytics list, Pine. He's the >>>>> >>>>> director of analytics. >>>>> >>>>> >>>>> >>>>> Hirav: would you be looking for temporally /and/ contextually >>>>> >>>>> granular >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just temporally >>>>> >>>>> granular, so "a view to a page on enwiki at X time"? If the >>>>> >>>>> latter >>>>> >>>>> you've got more of a shot, I suspect. >>>>> >>>>> >>>>> >>>>> On 13 April 2015 at 03:47, Pine W <[email protected]> wrote: >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> This issue of pageview data granularity has been discussed >>>>> >>>>> before, and >>>>> >>> the >>>>> >>>>> answer has been that hourly is the smallest increment allowed to >>>>> >>>>> be >>>>> >>> revealed >>>>> >>>>> publicly, for privacy reasons. >>>>> >>>>> >>>>> >>>>> I believe that the person you will want to discuss your request >>>>> >>>>> with is >>>>> >>>>> Toby, who I have cc'd here. >>>>> >>>>> >>>>> >>>>> Pine >>>>> >>>>> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" <[email protected]> >>>>> >>> wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Hi Wikimedia Analytics Team, >>>>> >>>>> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic server >>>>> >>> allocation >>>>> >>>>> algorithms and we were looking for a suitable datasets to test >>>>> >>>>> our >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an amazing >>>>> >>>>> data >>>>> >>> set >>>>> >>>>> of hourly page views, but we were looking for something a bit >>>>> >>>>> more >>>>> >>> granular, >>>>> >>>>> such as aggregated page requests to English Wikipedia on a minute >>>>> >>>>> by >>>>> >>> minute >>>>> >>>>> basis or second by second basis if possible. >>>>> >>>>> >>>>> >>>>> We are more than happy to pour through any raw data you might >>>>> >>>>> have that >>>>> >>>>> would help us calculate page requests at this granular level. >>>>> >>>>> Please >>>>> >>> let us >>>>> >>>>> know if it would be possible to get such data and if so how. >>>>> >>>>> Thank you >>>>> >>> in >>>>> >>>>> advance for your help. >>>>> >>>>> >>>>> >>>>> Best, >>>>> >>>>> >>>>> >>>>> Hirav Gandhi >>>>> >>>>> _______________________________________________ >>>>> >>>>> Analytics mailing list >>>>> >>>>> [email protected] >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> >>>>> Analytics mailing list >>>>> >>>>> [email protected] >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Oliver Keyes >>>>> >>>>> Research Analyst >>>>> >>>>> Wikimedia Foundation >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------ >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> >>>>> Analytics mailing list >>>>> >>>>> [email protected] >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> End of Analytics Digest, Vol 38, Issue 21 >>>>> >>>>> ***************************************** >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> >>>>> Analytics mailing list >>>>> >>>>> [email protected] >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> -- >>>>> >>>> Oliver Keyes >>>>> >>>> Research Analyst >>>>> >>>> Wikimedia Foundation >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> -- >>>>> >>> Oliver Keyes >>>>> >>> Research Analyst >>>>> >>> Wikimedia Foundation >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> ------------------------------ >>>>> >>> >>>>> >>> _______________________________________________ >>>>> >>> Analytics mailing list >>>>> >>> [email protected] >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>> >>>>> >> -------------- next part -------------- >>>>> >> An HTML attachment was scrubbed... >>>>> >> URL: >>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html> >>>>> >> >>>>> >> ------------------------------ >>>>> >> >>>>> >> Message: 3 >>>>> >> Date: Mon, 13 Apr 2015 19:40:04 -0400 >>>>> >> From: Oliver Keyes <[email protected]> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody who >>>>> >> has an interest in Wikipedia and analytics." >>>>> >> <[email protected]> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than hourly >>>>> >> basis >>>>> >> Message-ID: >>>>> >> >>>>> >> <caauqgdd6z5ussu11vw49fdmbsrhyejxku9yopyserib79j-...@mail.gmail.com> >>>>> >> Content-Type: text/plain; charset=UTF-8 >>>>> >> >>>>> >> .... >>>>> >> >>>>> >> >>>>> >> ...years? >>>>> >> >>>>> >> We have unsampled logs for, ah. 2 months. >>>>> >> >>>>> >> On 13 April 2015 at 19:30, Hirav Gandhi <[email protected]> >>>>> >> wrote: >>>>> >>> Thanks Oliver! >>>>> >>> >>>>> >>> We would like this data for as broad of a time period as you can >>>>> >>> muster. The >>>>> >>> more days, months and year represented in the dataset, the better. >>>>> >>> >>>>> >>>> >>>>> >>>> Okay, so: >>>>> >>>> >>>>> >>>> I took an hour from the pageviews logs,[0] and aggregated >>>>> >>>> pageviews to >>>>> >>>> enwiki (mobile and desktop both) by timestamp, down to one-second >>>>> >>>> resolution levels. The lowest number of pageviews to enwiki per >>>>> >>>> second >>>>> >>>> was 2,981 >>>>> >>>> >>>>> >>>> So, I don't personally have a problem with generating a release >>>>> >>>> of: >>>>> >>>> >>>>> >>>> 1. Pageviews per second; >>>>> >>>> 2. To enwiki; >>>>> >>>> 3. Over $TIME_PERIOD; >>>>> >>>> 4. grouping the mobile and desktop site >>>>> >>>> >>>>> >>>> But Dario or someone should chip in before I touch anything ;p >>>>> >>>> >>>>> >>>> 6am yesterday. 6am because it should be low-traffic, right? At >>>>> >>>> least >>>>> >>>> given our biases towards north america and europe >>>>> >>>> >>>>> >>>> On 13 April 2015 at 11:54, Oliver Keyes <[email protected]> >>>>> >>>> wrote: >>>>> >>>>> Then that sounds much more viable. I'll run a quick test now to >>>>> >>>>> see >>>>> >>>>> how much clustering we'd see at, say, the one-second resolution >>>>> >>>>> level, >>>>> >>>>> and throw it out here so we can make more informed decisions >>>>> >>>>> about a >>>>> >>>>> data release on this. >>>>> >>>>> >>>>> >>>>> On 13 April 2015 at 08:08, Hirav Gandhi <[email protected]> >>>>> >>>>> wrote: >>>>> >>>>>> Hi Oliver, >>>>> >>>>>> >>>>> >>>>>> Re: Hirav: would you be looking for temporally /and/ >>>>> >>>>>> contextually >>>>> >>>>>> granular >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just temporally >>>>> >>>>>> granular, >>>>> >>>>>> so "a view to a page on enwiki at X time"? If the latter you've >>>>> >>>>>> got >>>>> >>>>>> more of >>>>> >>>>>> a shot, I suspect. >>>>> >>>>>> >>>>> >>>>>> I only want the latter - I am not concerned with the context so >>>>> >>>>>> much as >>>>> >>>>>> just >>>>> >>>>>> “a view to a page on enwiki at X time.” >>>>> >>>>>> >>>>> >>>>>> Hirav >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> On Apr 13, 2015, at 5:00 AM, >>>>> >>>>>> [email protected] >>>>> >>>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> Send Analytics mailing list submissions to >>>>> >>>>>> [email protected] >>>>> >>>>>> >>>>> >>>>>> To subscribe or unsubscribe via the World Wide Web, visit >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>>> or, via email, send a message with subject or body 'help' to >>>>> >>>>>> [email protected] >>>>> >>>>>> >>>>> >>>>>> You can reach the person managing the list at >>>>> >>>>>> [email protected] >>>>> >>>>>> >>>>> >>>>>> When replying, please edit your Subject line so it is more >>>>> >>>>>> specific >>>>> >>>>>> than "Re: Contents of Analytics digest..." >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Today's Topics: >>>>> >>>>>> >>>>> >>>>>> 1. Re: Page views on a more frequent than hourly basis (Pine W) >>>>> >>>>>> 2. Re: Page views on a more frequent than hourly basis (Oliver >>>>> >>>>>> Keyes) >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> ---------------------------------------------------------------------- >>>>> >>>>>> >>>>> >>>>>> Message: 1 >>>>> >>>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700 >>>>> >>>>>> From: Pine W <[email protected]> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and everybody >>>>> >>>>>> who >>>>> >>>>>> has an interest in Wikipedia and analytics." >>>>> >>>>>> <[email protected]> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than >>>>> >>>>>> hourly >>>>> >>>>>> basis >>>>> >>>>>> Message-ID: >>>>> >>>>>> >>>>> >>>>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com> >>>>> >>>>>> Content-Type: text/plain; charset="utf-8" >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Hi, >>>>> >>>>>> >>>>> >>>>>> This issue of pageview data granularity has been discussed >>>>> >>>>>> before, and >>>>> >>>>>> the >>>>> >>>>>> answer has been that hourly is the smallest increment allowed to >>>>> >>>>>> be >>>>> >>>>>> revealed publicly, for privacy reasons. >>>>> >>>>>> >>>>> >>>>>> I believe that the person you will want to discuss your request >>>>> >>>>>> with is >>>>> >>>>>> Toby, who I have cc'd here. >>>>> >>>>>> >>>>> >>>>>> Pine >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" >>>>> >>>>>> <[email protected]> >>>>> >>>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> Hi Wikimedia Analytics Team, >>>>> >>>>>> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic server >>>>> >>>>>> allocation >>>>> >>>>>> algorithms and we were looking for a suitable datasets to test >>>>> >>>>>> our >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an >>>>> >>>>>> amazing data >>>>> >>>>>> set >>>>> >>>>>> of hourly page views, but we were looking for something a bit >>>>> >>>>>> more >>>>> >>>>>> granular, such as aggregated page requests to English Wikipedia >>>>> >>>>>> on a >>>>> >>>>>> minute >>>>> >>>>>> by minute basis or second by second basis if possible. >>>>> >>>>>> >>>>> >>>>>> We are more than happy to pour through any raw data you might >>>>> >>>>>> have that >>>>> >>>>>> would help us calculate page requests at this granular level. >>>>> >>>>>> Please >>>>> >>>>>> let us >>>>> >>>>>> know if it would be possible to get such data and if so how. >>>>> >>>>>> Thank you >>>>> >>>>>> in >>>>> >>>>>> advance for your help. >>>>> >>>>>> >>>>> >>>>>> Best, >>>>> >>>>>> >>>>> >>>>>> Hirav Gandhi >>>>> >>>>>> _______________________________________________ >>>>> >>>>>> Analytics mailing list >>>>> >>>>>> [email protected] >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>>> >>>>> >>>>>> -------------- next part -------------- >>>>> >>>>>> An HTML attachment was scrubbed... >>>>> >>>>>> URL: >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html> >>>>> >>>>>> >>>>> >>>>>> ------------------------------ >>>>> >>>>>> >>>>> >>>>>> Message: 2 >>>>> >>>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400 >>>>> >>>>>> From: Oliver Keyes <[email protected]> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and everybody >>>>> >>>>>> who >>>>> >>>>>> has an interest in Wikipedia and analytics." >>>>> >>>>>> <[email protected]> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than >>>>> >>>>>> hourly >>>>> >>>>>> basis >>>>> >>>>>> Message-ID: >>>>> >>>>>> >>>>> >>>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com> >>>>> >>>>>> Content-Type: text/plain; charset=UTF-8 >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Preeetty sure that Toby is on the analytics list, Pine. He's the >>>>> >>>>>> director of analytics. >>>>> >>>>>> >>>>> >>>>>> Hirav: would you be looking for temporally /and/ contextually >>>>> >>>>>> granular >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just temporally >>>>> >>>>>> granular, so "a view to a page on enwiki at X time"? If the >>>>> >>>>>> latter >>>>> >>>>>> you've got more of a shot, I suspect. >>>>> >>>>>> >>>>> >>>>>> On 13 April 2015 at 03:47, Pine W <[email protected]> wrote: >>>>> >>>>>> >>>>> >>>>>> Hi, >>>>> >>>>>> >>>>> >>>>>> This issue of pageview data granularity has been discussed >>>>> >>>>>> before, and >>>>> >>>>>> the >>>>> >>>>>> answer has been that hourly is the smallest increment allowed to >>>>> >>>>>> be >>>>> >>>>>> revealed >>>>> >>>>>> publicly, for privacy reasons. >>>>> >>>>>> >>>>> >>>>>> I believe that the person you will want to discuss your request >>>>> >>>>>> with is >>>>> >>>>>> Toby, who I have cc'd here. >>>>> >>>>>> >>>>> >>>>>> Pine >>>>> >>>>>> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" >>>>> >>>>>> <[email protected]> >>>>> >>>>>> wrote: >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> Hi Wikimedia Analytics Team, >>>>> >>>>>> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic server >>>>> >>>>>> allocation >>>>> >>>>>> algorithms and we were looking for a suitable datasets to test >>>>> >>>>>> our >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an >>>>> >>>>>> amazing data >>>>> >>>>>> set >>>>> >>>>>> of hourly page views, but we were looking for something a bit >>>>> >>>>>> more >>>>> >>>>>> granular, >>>>> >>>>>> such as aggregated page requests to English Wikipedia on a >>>>> >>>>>> minute by >>>>> >>>>>> minute >>>>> >>>>>> basis or second by second basis if possible. >>>>> >>>>>> >>>>> >>>>>> We are more than happy to pour through any raw data you might >>>>> >>>>>> have that >>>>> >>>>>> would help us calculate page requests at this granular level. >>>>> >>>>>> Please >>>>> >>>>>> let us >>>>> >>>>>> know if it would be possible to get such data and if so how. >>>>> >>>>>> Thank you >>>>> >>>>>> in >>>>> >>>>>> advance for your help. >>>>> >>>>>> >>>>> >>>>>> Best, >>>>> >>>>>> >>>>> >>>>>> Hirav Gandhi >>>>> >>>>>> _______________________________________________ >>>>> >>>>>> Analytics mailing list >>>>> >>>>>> [email protected] >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> _______________________________________________ >>>>> >>>>>> Analytics mailing list >>>>> >>>>>> [email protected] >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> -- >>>>> >>>>>> Oliver Keyes >>>>> >>>>>> Research Analyst >>>>> >>>>>> Wikimedia Foundation >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> ------------------------------ >>>>> >>>>>> >>>>> >>>>>> _______________________________________________ >>>>> >>>>>> Analytics mailing list >>>>> >>>>>> [email protected] >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> End of Analytics Digest, Vol 38, Issue 21 >>>>> >>>>>> ***************************************** >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> _______________________________________________ >>>>> >>>>>> Analytics mailing list >>>>> >>>>>> [email protected] >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Oliver Keyes >>>>> >>>>> Research Analyst >>>>> >>>>> Wikimedia Foundation >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> -- >>>>> >>>> Oliver Keyes >>>>> >>>> Research Analyst >>>>> >>>> Wikimedia Foundation >>>>> >>>> >>>>> >>>> >>>>> >>>> >>>>> >>>> ------------------------------ >>>>> >>>> >>>>> >>>> _______________________________________________ >>>>> >>>> Analytics mailing list >>>>> >>>> [email protected] >>>>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>> >>>>> >>> >>>>> >>> _______________________________________________ >>>>> >>> Analytics mailing list >>>>> >>> [email protected] >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>> >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Oliver Keyes >>>>> >> Research Analyst >>>>> >> Wikimedia Foundation >>>>> >> >>>>> >> >>>>> >> >>>>> >> ------------------------------ >>>>> >> >>>>> >> _______________________________________________ >>>>> >> Analytics mailing list >>>>> >> [email protected] >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >> >>>>> >> >>>>> >> End of Analytics Digest, Vol 38, Issue 24 >>>>> >> ***************************************** >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > Analytics mailing list >>>>> > [email protected] >>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>>> >>>>> >>>>> -- >>>>> Oliver Keyes >>>>> Research Analyst >>>>> Wikimedia Foundation >>>> >>>> >>>> >>>> >>>> -- >>>> Dario Taraborelli >>>> Senior Research Scientist, Research and Data Lead >>>> Wikimedia Foundation >>>> http://wikimediafoundation.org >>>> http://nitens.org/taraborelli >>> >>> >>> >>> >>> -- >>> Dario Taraborelli >>> Senior Research Scientist, Research and Data Lead >>> Wikimedia Foundation >>> http://wikimediafoundation.org >>> http://nitens.org/taraborelli >> >> > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
