Cool. In that case, I will generate a dump for all the data we have, report back when done, and if there are no issues with releasing it, tarball it up and put it on figshare :)
On 15 April 2015 at 13:27, Dario Taraborelli <[email protected]> wrote: > thanks, both. Let's go ahead with English only and no spiders filtered or > mobile/desktop breakdown, per Oliver. > > Michelle – given the aggregation level I am fine moving forward with this > release, but let me know off-thread if you have any questions. > > Dario > > On Wed, Apr 15, 2015 at 9:53 AM, Oliver Keyes <[email protected]> wrote: >> >> Dario, >> >> No spider filtering, and no split between mobile and desktop; mobile >> and desktop are grouped. >> >> On 15 April 2015 at 12:46, Hirav Gandhi <[email protected]> wrote: >> > e.g. German* >> > >> > I need more coffee. >> > >> > >> > >> > On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi <[email protected]> >> > wrote: >> >> >> >> Dario - we just want a representative samples of traffic for a popular >> >> site like Wikipedia. We thought limiting to the English Wikipedia would >> >> be >> >> easier. >> >> >> >> If we get aggregated data across all language Wikipedia sites, we would >> >> need someway to tease out which language is being queried when. Some >> >> languages (for e.g. German) we would hypothesize would have more daily >> >> seasonality than languages like English. >> >> >> >> >> >> >> >> On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli >> >> <[email protected]> wrote: >> >>> >> >>> Hirav, Bharath – I also want to hear from you if there's a specific >> >>> reason to ask for English Wikipedia only or if a dataset encompassing >> >>> aggregate pageviews across all Wikimedia properties would do the job. >> >>> >> >>> Dario >> >>> >> >>> On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli >> >>> <[email protected]> wrote: >> >>>> >> >>>> Oliver -- thanks for running a preliminary check, I'm fine releasing >> >>>> this data in aggregate under CC0, I believe it would be valuable for >> >>>> this >> >>>> and other research projects (copying Michelle from Legal). >> >>>> >> >>>> Before we do so, though, I want to confirm the specs: aggregate >> >>>> pageviews per second to English Wikipedia, excluding bot traffic, >> >>>> broken >> >>>> down by access method (mobile web vs desktop site, not apps) for a >> >>>> 60-day >> >>>> period. Oliver – are these the filters you used to identify the data >> >>>> point >> >>>> with the smallest number of observations? >> >>>> >> >>>> Obviously, we will need to take into account this release when we >> >>>> start >> >>>> working on projects such as >> >>>> >> >>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits >> >>>> and >> >>>> >> >>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews >> >>>> >> >>>> Dario >> >>>> >> >>>> On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes <[email protected]> >> >>>> wrote: >> >>>>> >> >>>>> Bumping for Dario, per Pine's excellent example :) >> >>>>> >> >>>>> On 13 April 2015 at 22:18, Hirav Gandhi <[email protected]> >> >>>>> wrote: >> >>>>> > Oliver: Two months is fine. Thank you so much for your help! >> >>>>> > >> >>>>> >> On Apr 13, 2015, at 4:40 PM, >> >>>>> >> [email protected] >> >>>>> >> wrote: >> >>>>> >> >> >>>>> >> Send Analytics mailing list submissions to >> >>>>> >> [email protected] >> >>>>> >> >> >>>>> >> To subscribe or unsubscribe via the World Wide Web, visit >> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >> or, via email, send a message with subject or body 'help' to >> >>>>> >> [email protected] >> >>>>> >> >> >>>>> >> You can reach the person managing the list at >> >>>>> >> [email protected] >> >>>>> >> >> >>>>> >> When replying, please edit your Subject line so it is more >> >>>>> >> specific >> >>>>> >> than "Re: Contents of Analytics digest..." >> >>>>> >> >> >>>>> >> >> >>>>> >> Today's Topics: >> >>>>> >> >> >>>>> >> 1. Re: Page views on a more frequent than hourly basis (Pine W) >> >>>>> >> 2. Re: Page views on a more frequent than hourly basis (Hirav >> >>>>> >> Gandhi) >> >>>>> >> 3. Re: Page views on a more frequent than hourly basis (Oliver >> >>>>> >> Keyes) >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> ---------------------------------------------------------------------- >> >>>>> >> >> >>>>> >> Message: 1 >> >>>>> >> Date: Mon, 13 Apr 2015 13:34:23 -0700 >> >>>>> >> From: Pine W <[email protected]> >> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody >> >>>>> >> who >> >>>>> >> has an interest in Wikipedia and analytics." >> >>>>> >> <[email protected]> >> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than >> >>>>> >> hourly >> >>>>> >> basis >> >>>>> >> Message-ID: >> >>>>> >> >> >>>>> >> >> >>>>> >> <CAF=dyjjzmdfthz+0+lwnhb9m8xuod4wetgcfuxyb9qyf7cy...@mail.gmail.com> >> >>>>> >> Content-Type: text/plain; charset="utf-8" >> >>>>> >> >> >>>>> >> Hi Oliver, re ccing people who are on list, this is the protocol >> >>>>> >> we >> >>>>> >> followed in IEGCom to ping people who are subscribed and >> >>>>> >> mentioned >> >>>>> >> in >> >>>>> >> certain emails but, like many of us, may automatically move >> >>>>> >> emails >> >>>>> >> from >> >>>>> >> lists directly to folders where they may be unread for days. So >> >>>>> >> there is a >> >>>>> >> reason to do this. >> >>>>> >> >> >>>>> >> Thanks, >> >>>>> >> >> >>>>> >> Pine >> >>>>> >> -------------- next part -------------- >> >>>>> >> An HTML attachment was scrubbed... >> >>>>> >> URL: >> >>>>> >> >> >>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html> >> >>>>> >> >> >>>>> >> ------------------------------ >> >>>>> >> >> >>>>> >> Message: 2 >> >>>>> >> Date: Mon, 13 Apr 2015 16:30:43 -0700 >> >>>>> >> From: Hirav Gandhi <[email protected]> >> >>>>> >> To: [email protected] >> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than >> >>>>> >> hourly >> >>>>> >> basis >> >>>>> >> Message-ID: >> >>>>> >> >> >>>>> >> >> >>>>> >> <CANzC_EOvi4MP7G_SsxvW=uojpt2vxbnfmhcipqn1pumace-...@mail.gmail.com> >> >>>>> >> Content-Type: text/plain; charset="utf-8" >> >>>>> >> >> >>>>> >> Thanks Oliver! >> >>>>> >> >> >>>>> >> We would like this data for as broad of a time period as you can >> >>>>> >> muster. >> >>>>> >> The more days, months and year represented in the dataset, the >> >>>>> >> better. >> >>>>> >> >> >>>>> >> >> >>>>> >>> Okay, so: >> >>>>> >>> >> >>>>> >>> I took an hour from the pageviews logs,[0] and aggregated >> >>>>> >>> pageviews >> >>>>> >>> to >> >>>>> >>> enwiki (mobile and desktop both) by timestamp, down to >> >>>>> >>> one-second >> >>>>> >>> resolution levels. The lowest number of pageviews to enwiki per >> >>>>> >>> second >> >>>>> >>> was 2,981 >> >>>>> >>> >> >>>>> >>> So, I don't personally have a problem with generating a release >> >>>>> >>> of: >> >>>>> >>> >> >>>>> >>> 1. Pageviews per second; >> >>>>> >>> 2. To enwiki; >> >>>>> >>> 3. Over $TIME_PERIOD; >> >>>>> >>> 4. grouping the mobile and desktop site >> >>>>> >>> >> >>>>> >>> But Dario or someone should chip in before I touch anything ;p >> >>>>> >>> >> >>>>> >>> 6am yesterday. 6am because it should be low-traffic, right? At >> >>>>> >>> least >> >>>>> >>> given our biases towards north america and europe >> >>>>> >>> >> >>>>> >>> On 13 April 2015 at 11:54, Oliver Keyes <[email protected]> >> >>>>> >>> wrote: >> >>>>> >>>> Then that sounds much more viable. I'll run a quick test now to >> >>>>> >>>> see >> >>>>> >>>> how much clustering we'd see at, say, the one-second resolution >> >>>>> >>>> level, >> >>>>> >>>> and throw it out here so we can make more informed decisions >> >>>>> >>>> about >> >>>>> >>>> a >> >>>>> >>>> data release on this. >> >>>>> >>>> >> >>>>> >>>> On 13 April 2015 at 08:08, Hirav Gandhi >> >>>>> >>>> <[email protected]> >> >>>>> >>>> wrote: >> >>>>> >>>>> Hi Oliver, >> >>>>> >>>>> >> >>>>> >>>>> Re: Hirav: would you be looking for temporally /and/ >> >>>>> >>>>> contextually >> >>>>> >>> granular >> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just >> >>>>> >>>>> temporally >> >>>>> >>> granular, >> >>>>> >>>>> so "a view to a page on enwiki at X time"? If the latter >> >>>>> >>>>> you've >> >>>>> >>>>> got >> >>>>> >>> more of >> >>>>> >>>>> a shot, I suspect. >> >>>>> >>>>> >> >>>>> >>>>> I only want the latter - I am not concerned with the context >> >>>>> >>>>> so >> >>>>> >>>>> much as >> >>>>> >>> just >> >>>>> >>>>> “a view to a page on enwiki at X time.” >> >>>>> >>>>> >> >>>>> >>>>> Hirav >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> On Apr 13, 2015, at 5:00 AM, >> >>>>> >>>>> [email protected] >> >>>>> >>> wrote: >> >>>>> >>>>> >> >>>>> >>>>> Send Analytics mailing list submissions to >> >>>>> >>>>> [email protected] >> >>>>> >>>>> >> >>>>> >>>>> To subscribe or unsubscribe via the World Wide Web, visit >> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>> or, via email, send a message with subject or body 'help' to >> >>>>> >>>>> [email protected] >> >>>>> >>>>> >> >>>>> >>>>> You can reach the person managing the list at >> >>>>> >>>>> [email protected] >> >>>>> >>>>> >> >>>>> >>>>> When replying, please edit your Subject line so it is more >> >>>>> >>>>> specific >> >>>>> >>>>> than "Re: Contents of Analytics digest..." >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> Today's Topics: >> >>>>> >>>>> >> >>>>> >>>>> 1. Re: Page views on a more frequent than hourly basis (Pine >> >>>>> >>>>> W) >> >>>>> >>>>> 2. Re: Page views on a more frequent than hourly basis >> >>>>> >>>>> (Oliver >> >>>>> >>>>> Keyes) >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> ---------------------------------------------------------------------- >> >>>>> >>>>> >> >>>>> >>>>> Message: 1 >> >>>>> >>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700 >> >>>>> >>>>> From: Pine W <[email protected]> >> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and >> >>>>> >>>>> everybody >> >>>>> >>>>> who >> >>>>> >>>>> has an interest in Wikipedia and analytics." >> >>>>> >>>>> <[email protected]> >> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]> >> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than >> >>>>> >>>>> hourly >> >>>>> >>>>> basis >> >>>>> >>>>> Message-ID: >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com> >> >>>>> >>>>> Content-Type: text/plain; charset="utf-8" >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> Hi, >> >>>>> >>>>> >> >>>>> >>>>> This issue of pageview data granularity has been discussed >> >>>>> >>>>> before, and >> >>>>> >>> the >> >>>>> >>>>> answer has been that hourly is the smallest increment allowed >> >>>>> >>>>> to >> >>>>> >>>>> be >> >>>>> >>>>> revealed publicly, for privacy reasons. >> >>>>> >>>>> >> >>>>> >>>>> I believe that the person you will want to discuss your >> >>>>> >>>>> request >> >>>>> >>>>> with is >> >>>>> >>>>> Toby, who I have cc'd here. >> >>>>> >>>>> >> >>>>> >>>>> Pine >> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" >> >>>>> >>>>> <[email protected]> >> >>>>> >>> wrote: >> >>>>> >>>>> >> >>>>> >>>>> Hi Wikimedia Analytics Team, >> >>>>> >>>>> >> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic >> >>>>> >>>>> server >> >>>>> >>> allocation >> >>>>> >>>>> algorithms and we were looking for a suitable datasets to test >> >>>>> >>>>> our >> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an >> >>>>> >>>>> amazing >> >>>>> >>>>> data >> >>>>> >>> set >> >>>>> >>>>> of hourly page views, but we were looking for something a bit >> >>>>> >>>>> more >> >>>>> >>>>> granular, such as aggregated page requests to English >> >>>>> >>>>> Wikipedia >> >>>>> >>>>> on a >> >>>>> >>> minute >> >>>>> >>>>> by minute basis or second by second basis if possible. >> >>>>> >>>>> >> >>>>> >>>>> We are more than happy to pour through any raw data you might >> >>>>> >>>>> have that >> >>>>> >>>>> would help us calculate page requests at this granular level. >> >>>>> >>>>> Please >> >>>>> >>> let us >> >>>>> >>>>> know if it would be possible to get such data and if so how. >> >>>>> >>>>> Thank you >> >>>>> >>> in >> >>>>> >>>>> advance for your help. >> >>>>> >>>>> >> >>>>> >>>>> Best, >> >>>>> >>>>> >> >>>>> >>>>> Hirav Gandhi >> >>>>> >>>>> _______________________________________________ >> >>>>> >>>>> Analytics mailing list >> >>>>> >>>>> [email protected] >> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>> >> >>>>> >>>>> -------------- next part -------------- >> >>>>> >>>>> An HTML attachment was scrubbed... >> >>>>> >>>>> URL: >> >>>>> >>>>> < >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html >> >>>>> >>>> >> >>>>> >>>>> >> >>>>> >>>>> ------------------------------ >> >>>>> >>>>> >> >>>>> >>>>> Message: 2 >> >>>>> >>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400 >> >>>>> >>>>> From: Oliver Keyes <[email protected]> >> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and >> >>>>> >>>>> everybody >> >>>>> >>>>> who >> >>>>> >>>>> has an interest in Wikipedia and analytics." >> >>>>> >>>>> <[email protected]> >> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]> >> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than >> >>>>> >>>>> hourly >> >>>>> >>>>> basis >> >>>>> >>>>> Message-ID: >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com> >> >>>>> >>>>> Content-Type: text/plain; charset=UTF-8 >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> Preeetty sure that Toby is on the analytics list, Pine. He's >> >>>>> >>>>> the >> >>>>> >>>>> director of analytics. >> >>>>> >>>>> >> >>>>> >>>>> Hirav: would you be looking for temporally /and/ contextually >> >>>>> >>>>> granular >> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just >> >>>>> >>>>> temporally >> >>>>> >>>>> granular, so "a view to a page on enwiki at X time"? If the >> >>>>> >>>>> latter >> >>>>> >>>>> you've got more of a shot, I suspect. >> >>>>> >>>>> >> >>>>> >>>>> On 13 April 2015 at 03:47, Pine W <[email protected]> wrote: >> >>>>> >>>>> >> >>>>> >>>>> Hi, >> >>>>> >>>>> >> >>>>> >>>>> This issue of pageview data granularity has been discussed >> >>>>> >>>>> before, and >> >>>>> >>> the >> >>>>> >>>>> answer has been that hourly is the smallest increment allowed >> >>>>> >>>>> to >> >>>>> >>>>> be >> >>>>> >>> revealed >> >>>>> >>>>> publicly, for privacy reasons. >> >>>>> >>>>> >> >>>>> >>>>> I believe that the person you will want to discuss your >> >>>>> >>>>> request >> >>>>> >>>>> with is >> >>>>> >>>>> Toby, who I have cc'd here. >> >>>>> >>>>> >> >>>>> >>>>> Pine >> >>>>> >>>>> >> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" >> >>>>> >>>>> <[email protected]> >> >>>>> >>> wrote: >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> Hi Wikimedia Analytics Team, >> >>>>> >>>>> >> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic >> >>>>> >>>>> server >> >>>>> >>> allocation >> >>>>> >>>>> algorithms and we were looking for a suitable datasets to test >> >>>>> >>>>> our >> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an >> >>>>> >>>>> amazing >> >>>>> >>>>> data >> >>>>> >>> set >> >>>>> >>>>> of hourly page views, but we were looking for something a bit >> >>>>> >>>>> more >> >>>>> >>> granular, >> >>>>> >>>>> such as aggregated page requests to English Wikipedia on a >> >>>>> >>>>> minute >> >>>>> >>>>> by >> >>>>> >>> minute >> >>>>> >>>>> basis or second by second basis if possible. >> >>>>> >>>>> >> >>>>> >>>>> We are more than happy to pour through any raw data you might >> >>>>> >>>>> have that >> >>>>> >>>>> would help us calculate page requests at this granular level. >> >>>>> >>>>> Please >> >>>>> >>> let us >> >>>>> >>>>> know if it would be possible to get such data and if so how. >> >>>>> >>>>> Thank you >> >>>>> >>> in >> >>>>> >>>>> advance for your help. >> >>>>> >>>>> >> >>>>> >>>>> Best, >> >>>>> >>>>> >> >>>>> >>>>> Hirav Gandhi >> >>>>> >>>>> _______________________________________________ >> >>>>> >>>>> Analytics mailing list >> >>>>> >>>>> [email protected] >> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> _______________________________________________ >> >>>>> >>>>> Analytics mailing list >> >>>>> >>>>> [email protected] >> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> -- >> >>>>> >>>>> Oliver Keyes >> >>>>> >>>>> Research Analyst >> >>>>> >>>>> Wikimedia Foundation >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> ------------------------------ >> >>>>> >>>>> >> >>>>> >>>>> _______________________________________________ >> >>>>> >>>>> Analytics mailing list >> >>>>> >>>>> [email protected] >> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> End of Analytics Digest, Vol 38, Issue 21 >> >>>>> >>>>> ***************************************** >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> _______________________________________________ >> >>>>> >>>>> Analytics mailing list >> >>>>> >>>>> [email protected] >> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> -- >> >>>>> >>>> Oliver Keyes >> >>>>> >>>> Research Analyst >> >>>>> >>>> Wikimedia Foundation >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> -- >> >>>>> >>> Oliver Keyes >> >>>>> >>> Research Analyst >> >>>>> >>> Wikimedia Foundation >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> ------------------------------ >> >>>>> >>> >> >>>>> >>> _______________________________________________ >> >>>>> >>> Analytics mailing list >> >>>>> >>> [email protected] >> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>> >> >>>>> >> -------------- next part -------------- >> >>>>> >> An HTML attachment was scrubbed... >> >>>>> >> URL: >> >>>>> >> >> >>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html> >> >>>>> >> >> >>>>> >> ------------------------------ >> >>>>> >> >> >>>>> >> Message: 3 >> >>>>> >> Date: Mon, 13 Apr 2015 19:40:04 -0400 >> >>>>> >> From: Oliver Keyes <[email protected]> >> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody >> >>>>> >> who >> >>>>> >> has an interest in Wikipedia and analytics." >> >>>>> >> <[email protected]> >> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than >> >>>>> >> hourly >> >>>>> >> basis >> >>>>> >> Message-ID: >> >>>>> >> >> >>>>> >> >> >>>>> >> <caauqgdd6z5ussu11vw49fdmbsrhyejxku9yopyserib79j-...@mail.gmail.com> >> >>>>> >> Content-Type: text/plain; charset=UTF-8 >> >>>>> >> >> >>>>> >> .... >> >>>>> >> >> >>>>> >> >> >>>>> >> ...years? >> >>>>> >> >> >>>>> >> We have unsampled logs for, ah. 2 months. >> >>>>> >> >> >>>>> >> On 13 April 2015 at 19:30, Hirav Gandhi <[email protected]> >> >>>>> >> wrote: >> >>>>> >>> Thanks Oliver! >> >>>>> >>> >> >>>>> >>> We would like this data for as broad of a time period as you can >> >>>>> >>> muster. The >> >>>>> >>> more days, months and year represented in the dataset, the >> >>>>> >>> better. >> >>>>> >>> >> >>>>> >>>> >> >>>>> >>>> Okay, so: >> >>>>> >>>> >> >>>>> >>>> I took an hour from the pageviews logs,[0] and aggregated >> >>>>> >>>> pageviews to >> >>>>> >>>> enwiki (mobile and desktop both) by timestamp, down to >> >>>>> >>>> one-second >> >>>>> >>>> resolution levels. The lowest number of pageviews to enwiki per >> >>>>> >>>> second >> >>>>> >>>> was 2,981 >> >>>>> >>>> >> >>>>> >>>> So, I don't personally have a problem with generating a release >> >>>>> >>>> of: >> >>>>> >>>> >> >>>>> >>>> 1. Pageviews per second; >> >>>>> >>>> 2. To enwiki; >> >>>>> >>>> 3. Over $TIME_PERIOD; >> >>>>> >>>> 4. grouping the mobile and desktop site >> >>>>> >>>> >> >>>>> >>>> But Dario or someone should chip in before I touch anything ;p >> >>>>> >>>> >> >>>>> >>>> 6am yesterday. 6am because it should be low-traffic, right? At >> >>>>> >>>> least >> >>>>> >>>> given our biases towards north america and europe >> >>>>> >>>> >> >>>>> >>>> On 13 April 2015 at 11:54, Oliver Keyes <[email protected]> >> >>>>> >>>> wrote: >> >>>>> >>>>> Then that sounds much more viable. I'll run a quick test now >> >>>>> >>>>> to >> >>>>> >>>>> see >> >>>>> >>>>> how much clustering we'd see at, say, the one-second >> >>>>> >>>>> resolution >> >>>>> >>>>> level, >> >>>>> >>>>> and throw it out here so we can make more informed decisions >> >>>>> >>>>> about a >> >>>>> >>>>> data release on this. >> >>>>> >>>>> >> >>>>> >>>>> On 13 April 2015 at 08:08, Hirav Gandhi >> >>>>> >>>>> <[email protected]> >> >>>>> >>>>> wrote: >> >>>>> >>>>>> Hi Oliver, >> >>>>> >>>>>> >> >>>>> >>>>>> Re: Hirav: would you be looking for temporally /and/ >> >>>>> >>>>>> contextually >> >>>>> >>>>>> granular >> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just >> >>>>> >>>>>> temporally >> >>>>> >>>>>> granular, >> >>>>> >>>>>> so "a view to a page on enwiki at X time"? If the latter >> >>>>> >>>>>> you've >> >>>>> >>>>>> got >> >>>>> >>>>>> more of >> >>>>> >>>>>> a shot, I suspect. >> >>>>> >>>>>> >> >>>>> >>>>>> I only want the latter - I am not concerned with the context >> >>>>> >>>>>> so >> >>>>> >>>>>> much as >> >>>>> >>>>>> just >> >>>>> >>>>>> “a view to a page on enwiki at X time.” >> >>>>> >>>>>> >> >>>>> >>>>>> Hirav >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> On Apr 13, 2015, at 5:00 AM, >> >>>>> >>>>>> [email protected] >> >>>>> >>>>>> wrote: >> >>>>> >>>>>> >> >>>>> >>>>>> Send Analytics mailing list submissions to >> >>>>> >>>>>> [email protected] >> >>>>> >>>>>> >> >>>>> >>>>>> To subscribe or unsubscribe via the World Wide Web, visit >> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>>> or, via email, send a message with subject or body 'help' to >> >>>>> >>>>>> [email protected] >> >>>>> >>>>>> >> >>>>> >>>>>> You can reach the person managing the list at >> >>>>> >>>>>> [email protected] >> >>>>> >>>>>> >> >>>>> >>>>>> When replying, please edit your Subject line so it is more >> >>>>> >>>>>> specific >> >>>>> >>>>>> than "Re: Contents of Analytics digest..." >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> Today's Topics: >> >>>>> >>>>>> >> >>>>> >>>>>> 1. Re: Page views on a more frequent than hourly basis (Pine >> >>>>> >>>>>> W) >> >>>>> >>>>>> 2. Re: Page views on a more frequent than hourly basis >> >>>>> >>>>>> (Oliver >> >>>>> >>>>>> Keyes) >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> ---------------------------------------------------------------------- >> >>>>> >>>>>> >> >>>>> >>>>>> Message: 1 >> >>>>> >>>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700 >> >>>>> >>>>>> From: Pine W <[email protected]> >> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and >> >>>>> >>>>>> everybody >> >>>>> >>>>>> who >> >>>>> >>>>>> has an interest in Wikipedia and analytics." >> >>>>> >>>>>> <[email protected]> >> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]> >> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than >> >>>>> >>>>>> hourly >> >>>>> >>>>>> basis >> >>>>> >>>>>> Message-ID: >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com> >> >>>>> >>>>>> Content-Type: text/plain; charset="utf-8" >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> Hi, >> >>>>> >>>>>> >> >>>>> >>>>>> This issue of pageview data granularity has been discussed >> >>>>> >>>>>> before, and >> >>>>> >>>>>> the >> >>>>> >>>>>> answer has been that hourly is the smallest increment allowed >> >>>>> >>>>>> to >> >>>>> >>>>>> be >> >>>>> >>>>>> revealed publicly, for privacy reasons. >> >>>>> >>>>>> >> >>>>> >>>>>> I believe that the person you will want to discuss your >> >>>>> >>>>>> request >> >>>>> >>>>>> with is >> >>>>> >>>>>> Toby, who I have cc'd here. >> >>>>> >>>>>> >> >>>>> >>>>>> Pine >> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" >> >>>>> >>>>>> <[email protected]> >> >>>>> >>>>>> wrote: >> >>>>> >>>>>> >> >>>>> >>>>>> Hi Wikimedia Analytics Team, >> >>>>> >>>>>> >> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic >> >>>>> >>>>>> server >> >>>>> >>>>>> allocation >> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to >> >>>>> >>>>>> test >> >>>>> >>>>>> our >> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an >> >>>>> >>>>>> amazing data >> >>>>> >>>>>> set >> >>>>> >>>>>> of hourly page views, but we were looking for something a bit >> >>>>> >>>>>> more >> >>>>> >>>>>> granular, such as aggregated page requests to English >> >>>>> >>>>>> Wikipedia >> >>>>> >>>>>> on a >> >>>>> >>>>>> minute >> >>>>> >>>>>> by minute basis or second by second basis if possible. >> >>>>> >>>>>> >> >>>>> >>>>>> We are more than happy to pour through any raw data you might >> >>>>> >>>>>> have that >> >>>>> >>>>>> would help us calculate page requests at this granular level. >> >>>>> >>>>>> Please >> >>>>> >>>>>> let us >> >>>>> >>>>>> know if it would be possible to get such data and if so how. >> >>>>> >>>>>> Thank you >> >>>>> >>>>>> in >> >>>>> >>>>>> advance for your help. >> >>>>> >>>>>> >> >>>>> >>>>>> Best, >> >>>>> >>>>>> >> >>>>> >>>>>> Hirav Gandhi >> >>>>> >>>>>> _______________________________________________ >> >>>>> >>>>>> Analytics mailing list >> >>>>> >>>>>> [email protected] >> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>>> >> >>>>> >>>>>> -------------- next part -------------- >> >>>>> >>>>>> An HTML attachment was scrubbed... >> >>>>> >>>>>> URL: >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html> >> >>>>> >>>>>> >> >>>>> >>>>>> ------------------------------ >> >>>>> >>>>>> >> >>>>> >>>>>> Message: 2 >> >>>>> >>>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400 >> >>>>> >>>>>> From: Oliver Keyes <[email protected]> >> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and >> >>>>> >>>>>> everybody >> >>>>> >>>>>> who >> >>>>> >>>>>> has an interest in Wikipedia and analytics." >> >>>>> >>>>>> <[email protected]> >> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]> >> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than >> >>>>> >>>>>> hourly >> >>>>> >>>>>> basis >> >>>>> >>>>>> Message-ID: >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com> >> >>>>> >>>>>> Content-Type: text/plain; charset=UTF-8 >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> Preeetty sure that Toby is on the analytics list, Pine. He's >> >>>>> >>>>>> the >> >>>>> >>>>>> director of analytics. >> >>>>> >>>>>> >> >>>>> >>>>>> Hirav: would you be looking for temporally /and/ contextually >> >>>>> >>>>>> granular >> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just >> >>>>> >>>>>> temporally >> >>>>> >>>>>> granular, so "a view to a page on enwiki at X time"? If the >> >>>>> >>>>>> latter >> >>>>> >>>>>> you've got more of a shot, I suspect. >> >>>>> >>>>>> >> >>>>> >>>>>> On 13 April 2015 at 03:47, Pine W <[email protected]> >> >>>>> >>>>>> wrote: >> >>>>> >>>>>> >> >>>>> >>>>>> Hi, >> >>>>> >>>>>> >> >>>>> >>>>>> This issue of pageview data granularity has been discussed >> >>>>> >>>>>> before, and >> >>>>> >>>>>> the >> >>>>> >>>>>> answer has been that hourly is the smallest increment allowed >> >>>>> >>>>>> to >> >>>>> >>>>>> be >> >>>>> >>>>>> revealed >> >>>>> >>>>>> publicly, for privacy reasons. >> >>>>> >>>>>> >> >>>>> >>>>>> I believe that the person you will want to discuss your >> >>>>> >>>>>> request >> >>>>> >>>>>> with is >> >>>>> >>>>>> Toby, who I have cc'd here. >> >>>>> >>>>>> >> >>>>> >>>>>> Pine >> >>>>> >>>>>> >> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" >> >>>>> >>>>>> <[email protected]> >> >>>>> >>>>>> wrote: >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> Hi Wikimedia Analytics Team, >> >>>>> >>>>>> >> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic >> >>>>> >>>>>> server >> >>>>> >>>>>> allocation >> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to >> >>>>> >>>>>> test >> >>>>> >>>>>> our >> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an >> >>>>> >>>>>> amazing data >> >>>>> >>>>>> set >> >>>>> >>>>>> of hourly page views, but we were looking for something a bit >> >>>>> >>>>>> more >> >>>>> >>>>>> granular, >> >>>>> >>>>>> such as aggregated page requests to English Wikipedia on a >> >>>>> >>>>>> minute by >> >>>>> >>>>>> minute >> >>>>> >>>>>> basis or second by second basis if possible. >> >>>>> >>>>>> >> >>>>> >>>>>> We are more than happy to pour through any raw data you might >> >>>>> >>>>>> have that >> >>>>> >>>>>> would help us calculate page requests at this granular level. >> >>>>> >>>>>> Please >> >>>>> >>>>>> let us >> >>>>> >>>>>> know if it would be possible to get such data and if so how. >> >>>>> >>>>>> Thank you >> >>>>> >>>>>> in >> >>>>> >>>>>> advance for your help. >> >>>>> >>>>>> >> >>>>> >>>>>> Best, >> >>>>> >>>>>> >> >>>>> >>>>>> Hirav Gandhi >> >>>>> >>>>>> _______________________________________________ >> >>>>> >>>>>> Analytics mailing list >> >>>>> >>>>>> [email protected] >> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> _______________________________________________ >> >>>>> >>>>>> Analytics mailing list >> >>>>> >>>>>> [email protected] >> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> -- >> >>>>> >>>>>> Oliver Keyes >> >>>>> >>>>>> Research Analyst >> >>>>> >>>>>> Wikimedia Foundation >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> ------------------------------ >> >>>>> >>>>>> >> >>>>> >>>>>> _______________________________________________ >> >>>>> >>>>>> Analytics mailing list >> >>>>> >>>>>> [email protected] >> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> End of Analytics Digest, Vol 38, Issue 21 >> >>>>> >>>>>> ***************************************** >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> >> >>>>> >>>>>> _______________________________________________ >> >>>>> >>>>>> Analytics mailing list >> >>>>> >>>>>> [email protected] >> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> >> >>>>> >>>>> -- >> >>>>> >>>>> Oliver Keyes >> >>>>> >>>>> Research Analyst >> >>>>> >>>>> Wikimedia Foundation >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> -- >> >>>>> >>>> Oliver Keyes >> >>>>> >>>> Research Analyst >> >>>>> >>>> Wikimedia Foundation >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>>> >>>> ------------------------------ >> >>>>> >>>> >> >>>>> >>>> _______________________________________________ >> >>>>> >>>> Analytics mailing list >> >>>>> >>>> [email protected] >> >>>>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> _______________________________________________ >> >>>>> >>> Analytics mailing list >> >>>>> >>> [email protected] >> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >>> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> -- >> >>>>> >> Oliver Keyes >> >>>>> >> Research Analyst >> >>>>> >> Wikimedia Foundation >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> ------------------------------ >> >>>>> >> >> >>>>> >> _______________________________________________ >> >>>>> >> Analytics mailing list >> >>>>> >> [email protected] >> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >> >> >>>>> >> >> >>>>> >> End of Analytics Digest, Vol 38, Issue 24 >> >>>>> >> ***************************************** >> >>>>> > >> >>>>> > >> >>>>> > _______________________________________________ >> >>>>> > Analytics mailing list >> >>>>> > [email protected] >> >>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Oliver Keyes >> >>>>> Research Analyst >> >>>>> Wikimedia Foundation >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Dario Taraborelli >> >>>> Senior Research Scientist, Research and Data Lead >> >>>> Wikimedia Foundation >> >>>> http://wikimediafoundation.org >> >>>> http://nitens.org/taraborelli >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Dario Taraborelli >> >>> Senior Research Scientist, Research and Data Lead >> >>> Wikimedia Foundation >> >>> http://wikimediafoundation.org >> >>> http://nitens.org/taraborelli >> >> >> >> >> > >> >> >> >> -- >> Oliver Keyes >> Research Analyst >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > -- > Dario Taraborelli > Senior Research Scientist, Research and Data Lead > Wikimedia Foundation > http://wikimediafoundation.org > http://nitens.org/taraborelli > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
