/This/ you say 2.5 seconds after I've launched the query ;p. Yes, it is possible, but I'll have to recalculate the likely minimum and check that it's still okay.
On 15 April 2015 at 13:32, Hirav Gandhi <[email protected]> wrote: > Hi Dario, > > One last question - would it be possible to break it out into mobile vs > desktop? We are also concerned there might be seasonality effects in there > as well. Please let us know. > > Best, > > Hirav > > > > On Wed, Apr 15, 2015 at 10:27 AM, Dario Taraborelli > <[email protected]> wrote: >> >> thanks, both. Let's go ahead with English only and no spiders filtered or >> mobile/desktop breakdown, per Oliver. >> >> Michelle – given the aggregation level I am fine moving forward with this >> release, but let me know off-thread if you have any questions. >> >> Dario >> >> On Wed, Apr 15, 2015 at 9:53 AM, Oliver Keyes <[email protected]> >> wrote: >>> >>> Dario, >>> >>> No spider filtering, and no split between mobile and desktop; mobile >>> and desktop are grouped. >>> >>> On 15 April 2015 at 12:46, Hirav Gandhi <[email protected]> wrote: >>> > e.g. German* >>> > >>> > I need more coffee. >>> > >>> > >>> > >>> > On Wed, Apr 15, 2015 at 9:35 AM, Hirav Gandhi <[email protected]> >>> > wrote: >>> >> >>> >> Dario - we just want a representative samples of traffic for a popular >>> >> site like Wikipedia. We thought limiting to the English Wikipedia >>> >> would be >>> >> easier. >>> >> >>> >> If we get aggregated data across all language Wikipedia sites, we >>> >> would >>> >> need someway to tease out which language is being queried when. Some >>> >> languages (for e.g. German) we would hypothesize would have more daily >>> >> seasonality than languages like English. >>> >> >>> >> >>> >> >>> >> On Wed, Apr 15, 2015 at 9:32 AM, Dario Taraborelli >>> >> <[email protected]> wrote: >>> >>> >>> >>> Hirav, Bharath – I also want to hear from you if there's a specific >>> >>> reason to ask for English Wikipedia only or if a dataset encompassing >>> >>> aggregate pageviews across all Wikimedia properties would do the job. >>> >>> >>> >>> Dario >>> >>> >>> >>> On Wed, Apr 15, 2015 at 9:09 AM, Dario Taraborelli >>> >>> <[email protected]> wrote: >>> >>>> >>> >>>> Oliver -- thanks for running a preliminary check, I'm fine releasing >>> >>>> this data in aggregate under CC0, I believe it would be valuable for >>> >>>> this >>> >>>> and other research projects (copying Michelle from Legal). >>> >>>> >>> >>>> Before we do so, though, I want to confirm the specs: aggregate >>> >>>> pageviews per second to English Wikipedia, excluding bot traffic, >>> >>>> broken >>> >>>> down by access method (mobile web vs desktop site, not apps) for a >>> >>>> 60-day >>> >>>> period. Oliver – are these the filters you used to identify the data >>> >>>> point >>> >>>> with the smallest number of observations? >>> >>>> >>> >>>> Obviously, we will need to take into account this release when we >>> >>>> start >>> >>>> working on projects such as >>> >>>> >>> >>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_edits >>> >>>> and >>> >>>> >>> >>>> https://meta.wikimedia.org/wiki/Research:Geo-aggregation_of_Wikipedia_pageviews >>> >>>> >>> >>>> Dario >>> >>>> >>> >>>> On Mon, Apr 13, 2015 at 9:37 PM, Oliver Keyes <[email protected]> >>> >>>> wrote: >>> >>>>> >>> >>>>> Bumping for Dario, per Pine's excellent example :) >>> >>>>> >>> >>>>> On 13 April 2015 at 22:18, Hirav Gandhi <[email protected]> >>> >>>>> wrote: >>> >>>>> > Oliver: Two months is fine. Thank you so much for your help! >>> >>>>> > >>> >>>>> >> On Apr 13, 2015, at 4:40 PM, >>> >>>>> >> [email protected] >>> >>>>> >> wrote: >>> >>>>> >> >>> >>>>> >> Send Analytics mailing list submissions to >>> >>>>> >> [email protected] >>> >>>>> >> >>> >>>>> >> To subscribe or unsubscribe via the World Wide Web, visit >>> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >> or, via email, send a message with subject or body 'help' to >>> >>>>> >> [email protected] >>> >>>>> >> >>> >>>>> >> You can reach the person managing the list at >>> >>>>> >> [email protected] >>> >>>>> >> >>> >>>>> >> When replying, please edit your Subject line so it is more >>> >>>>> >> specific >>> >>>>> >> than "Re: Contents of Analytics digest..." >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> Today's Topics: >>> >>>>> >> >>> >>>>> >> 1. Re: Page views on a more frequent than hourly basis (Pine >>> >>>>> >> W) >>> >>>>> >> 2. Re: Page views on a more frequent than hourly basis (Hirav >>> >>>>> >> Gandhi) >>> >>>>> >> 3. Re: Page views on a more frequent than hourly basis (Oliver >>> >>>>> >> Keyes) >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> ---------------------------------------------------------------------- >>> >>>>> >> >>> >>>>> >> Message: 1 >>> >>>>> >> Date: Mon, 13 Apr 2015 13:34:23 -0700 >>> >>>>> >> From: Pine W <[email protected]> >>> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody >>> >>>>> >> who >>> >>>>> >> has an interest in Wikipedia and analytics." >>> >>>>> >> <[email protected]> >>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than >>> >>>>> >> hourly >>> >>>>> >> basis >>> >>>>> >> Message-ID: >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> <CAF=dyjjzmdfthz+0+lwnhb9m8xuod4wetgcfuxyb9qyf7cy...@mail.gmail.com> >>> >>>>> >> Content-Type: text/plain; charset="utf-8" >>> >>>>> >> >>> >>>>> >> Hi Oliver, re ccing people who are on list, this is the protocol >>> >>>>> >> we >>> >>>>> >> followed in IEGCom to ping people who are subscribed and >>> >>>>> >> mentioned >>> >>>>> >> in >>> >>>>> >> certain emails but, like many of us, may automatically move >>> >>>>> >> emails >>> >>>>> >> from >>> >>>>> >> lists directly to folders where they may be unread for days. So >>> >>>>> >> there is a >>> >>>>> >> reason to do this. >>> >>>>> >> >>> >>>>> >> Thanks, >>> >>>>> >> >>> >>>>> >> Pine >>> >>>>> >> -------------- next part -------------- >>> >>>>> >> An HTML attachment was scrubbed... >>> >>>>> >> URL: >>> >>>>> >> >>> >>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/aac0ef89/attachment-0001.html> >>> >>>>> >> >>> >>>>> >> ------------------------------ >>> >>>>> >> >>> >>>>> >> Message: 2 >>> >>>>> >> Date: Mon, 13 Apr 2015 16:30:43 -0700 >>> >>>>> >> From: Hirav Gandhi <[email protected]> >>> >>>>> >> To: [email protected] >>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than >>> >>>>> >> hourly >>> >>>>> >> basis >>> >>>>> >> Message-ID: >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> <CANzC_EOvi4MP7G_SsxvW=uojpt2vxbnfmhcipqn1pumace-...@mail.gmail.com> >>> >>>>> >> Content-Type: text/plain; charset="utf-8" >>> >>>>> >> >>> >>>>> >> Thanks Oliver! >>> >>>>> >> >>> >>>>> >> We would like this data for as broad of a time period as you can >>> >>>>> >> muster. >>> >>>>> >> The more days, months and year represented in the dataset, the >>> >>>>> >> better. >>> >>>>> >> >>> >>>>> >> >>> >>>>> >>> Okay, so: >>> >>>>> >>> >>> >>>>> >>> I took an hour from the pageviews logs,[0] and aggregated >>> >>>>> >>> pageviews >>> >>>>> >>> to >>> >>>>> >>> enwiki (mobile and desktop both) by timestamp, down to >>> >>>>> >>> one-second >>> >>>>> >>> resolution levels. The lowest number of pageviews to enwiki per >>> >>>>> >>> second >>> >>>>> >>> was 2,981 >>> >>>>> >>> >>> >>>>> >>> So, I don't personally have a problem with generating a release >>> >>>>> >>> of: >>> >>>>> >>> >>> >>>>> >>> 1. Pageviews per second; >>> >>>>> >>> 2. To enwiki; >>> >>>>> >>> 3. Over $TIME_PERIOD; >>> >>>>> >>> 4. grouping the mobile and desktop site >>> >>>>> >>> >>> >>>>> >>> But Dario or someone should chip in before I touch anything ;p >>> >>>>> >>> >>> >>>>> >>> 6am yesterday. 6am because it should be low-traffic, right? At >>> >>>>> >>> least >>> >>>>> >>> given our biases towards north america and europe >>> >>>>> >>> >>> >>>>> >>> On 13 April 2015 at 11:54, Oliver Keyes <[email protected]> >>> >>>>> >>> wrote: >>> >>>>> >>>> Then that sounds much more viable. I'll run a quick test now >>> >>>>> >>>> to >>> >>>>> >>>> see >>> >>>>> >>>> how much clustering we'd see at, say, the one-second >>> >>>>> >>>> resolution >>> >>>>> >>>> level, >>> >>>>> >>>> and throw it out here so we can make more informed decisions >>> >>>>> >>>> about >>> >>>>> >>>> a >>> >>>>> >>>> data release on this. >>> >>>>> >>>> >>> >>>>> >>>> On 13 April 2015 at 08:08, Hirav Gandhi >>> >>>>> >>>> <[email protected]> >>> >>>>> >>>> wrote: >>> >>>>> >>>>> Hi Oliver, >>> >>>>> >>>>> >>> >>>>> >>>>> Re: Hirav: would you be looking for temporally /and/ >>> >>>>> >>>>> contextually >>> >>>>> >>> granular >>> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just >>> >>>>> >>>>> temporally >>> >>>>> >>> granular, >>> >>>>> >>>>> so "a view to a page on enwiki at X time"? If the latter >>> >>>>> >>>>> you've >>> >>>>> >>>>> got >>> >>>>> >>> more of >>> >>>>> >>>>> a shot, I suspect. >>> >>>>> >>>>> >>> >>>>> >>>>> I only want the latter - I am not concerned with the context >>> >>>>> >>>>> so >>> >>>>> >>>>> much as >>> >>>>> >>> just >>> >>>>> >>>>> “a view to a page on enwiki at X time.” >>> >>>>> >>>>> >>> >>>>> >>>>> Hirav >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> On Apr 13, 2015, at 5:00 AM, >>> >>>>> >>>>> [email protected] >>> >>>>> >>> wrote: >>> >>>>> >>>>> >>> >>>>> >>>>> Send Analytics mailing list submissions to >>> >>>>> >>>>> [email protected] >>> >>>>> >>>>> >>> >>>>> >>>>> To subscribe or unsubscribe via the World Wide Web, visit >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>> or, via email, send a message with subject or body 'help' to >>> >>>>> >>>>> [email protected] >>> >>>>> >>>>> >>> >>>>> >>>>> You can reach the person managing the list at >>> >>>>> >>>>> [email protected] >>> >>>>> >>>>> >>> >>>>> >>>>> When replying, please edit your Subject line so it is more >>> >>>>> >>>>> specific >>> >>>>> >>>>> than "Re: Contents of Analytics digest..." >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> Today's Topics: >>> >>>>> >>>>> >>> >>>>> >>>>> 1. Re: Page views on a more frequent than hourly basis (Pine >>> >>>>> >>>>> W) >>> >>>>> >>>>> 2. Re: Page views on a more frequent than hourly basis >>> >>>>> >>>>> (Oliver >>> >>>>> >>>>> Keyes) >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> ---------------------------------------------------------------------- >>> >>>>> >>>>> >>> >>>>> >>>>> Message: 1 >>> >>>>> >>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700 >>> >>>>> >>>>> From: Pine W <[email protected]> >>> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and >>> >>>>> >>>>> everybody >>> >>>>> >>>>> who >>> >>>>> >>>>> has an interest in Wikipedia and analytics." >>> >>>>> >>>>> <[email protected]> >>> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]> >>> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than >>> >>>>> >>>>> hourly >>> >>>>> >>>>> basis >>> >>>>> >>>>> Message-ID: >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com> >>> >>>>> >>>>> Content-Type: text/plain; charset="utf-8" >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> Hi, >>> >>>>> >>>>> >>> >>>>> >>>>> This issue of pageview data granularity has been discussed >>> >>>>> >>>>> before, and >>> >>>>> >>> the >>> >>>>> >>>>> answer has been that hourly is the smallest increment allowed >>> >>>>> >>>>> to >>> >>>>> >>>>> be >>> >>>>> >>>>> revealed publicly, for privacy reasons. >>> >>>>> >>>>> >>> >>>>> >>>>> I believe that the person you will want to discuss your >>> >>>>> >>>>> request >>> >>>>> >>>>> with is >>> >>>>> >>>>> Toby, who I have cc'd here. >>> >>>>> >>>>> >>> >>>>> >>>>> Pine >>> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" >>> >>>>> >>>>> <[email protected]> >>> >>>>> >>> wrote: >>> >>>>> >>>>> >>> >>>>> >>>>> Hi Wikimedia Analytics Team, >>> >>>>> >>>>> >>> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic >>> >>>>> >>>>> server >>> >>>>> >>> allocation >>> >>>>> >>>>> algorithms and we were looking for a suitable datasets to >>> >>>>> >>>>> test >>> >>>>> >>>>> our >>> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an >>> >>>>> >>>>> amazing >>> >>>>> >>>>> data >>> >>>>> >>> set >>> >>>>> >>>>> of hourly page views, but we were looking for something a bit >>> >>>>> >>>>> more >>> >>>>> >>>>> granular, such as aggregated page requests to English >>> >>>>> >>>>> Wikipedia >>> >>>>> >>>>> on a >>> >>>>> >>> minute >>> >>>>> >>>>> by minute basis or second by second basis if possible. >>> >>>>> >>>>> >>> >>>>> >>>>> We are more than happy to pour through any raw data you might >>> >>>>> >>>>> have that >>> >>>>> >>>>> would help us calculate page requests at this granular level. >>> >>>>> >>>>> Please >>> >>>>> >>> let us >>> >>>>> >>>>> know if it would be possible to get such data and if so how. >>> >>>>> >>>>> Thank you >>> >>>>> >>> in >>> >>>>> >>>>> advance for your help. >>> >>>>> >>>>> >>> >>>>> >>>>> Best, >>> >>>>> >>>>> >>> >>>>> >>>>> Hirav Gandhi >>> >>>>> >>>>> _______________________________________________ >>> >>>>> >>>>> Analytics mailing list >>> >>>>> >>>>> [email protected] >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>> >>> >>>>> >>>>> -------------- next part -------------- >>> >>>>> >>>>> An HTML attachment was scrubbed... >>> >>>>> >>>>> URL: >>> >>>>> >>>>> < >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> >>> https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html >>> >>>>> >>>> >>> >>>>> >>>>> >>> >>>>> >>>>> ------------------------------ >>> >>>>> >>>>> >>> >>>>> >>>>> Message: 2 >>> >>>>> >>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400 >>> >>>>> >>>>> From: Oliver Keyes <[email protected]> >>> >>>>> >>>>> To: "A mailing list for the Analytics Team at WMF and >>> >>>>> >>>>> everybody >>> >>>>> >>>>> who >>> >>>>> >>>>> has an interest in Wikipedia and analytics." >>> >>>>> >>>>> <[email protected]> >>> >>>>> >>>>> Cc: Bharath Sitaraman <[email protected]> >>> >>>>> >>>>> Subject: Re: [Analytics] Page views on a more frequent than >>> >>>>> >>>>> hourly >>> >>>>> >>>>> basis >>> >>>>> >>>>> Message-ID: >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com> >>> >>>>> >>>>> Content-Type: text/plain; charset=UTF-8 >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> Preeetty sure that Toby is on the analytics list, Pine. He's >>> >>>>> >>>>> the >>> >>>>> >>>>> director of analytics. >>> >>>>> >>>>> >>> >>>>> >>>>> Hirav: would you be looking for temporally /and/ contextually >>> >>>>> >>>>> granular >>> >>>>> >>>>> pageviews, i.e. "a view to X page at Y time", or just >>> >>>>> >>>>> temporally >>> >>>>> >>>>> granular, so "a view to a page on enwiki at X time"? If the >>> >>>>> >>>>> latter >>> >>>>> >>>>> you've got more of a shot, I suspect. >>> >>>>> >>>>> >>> >>>>> >>>>> On 13 April 2015 at 03:47, Pine W <[email protected]> >>> >>>>> >>>>> wrote: >>> >>>>> >>>>> >>> >>>>> >>>>> Hi, >>> >>>>> >>>>> >>> >>>>> >>>>> This issue of pageview data granularity has been discussed >>> >>>>> >>>>> before, and >>> >>>>> >>> the >>> >>>>> >>>>> answer has been that hourly is the smallest increment allowed >>> >>>>> >>>>> to >>> >>>>> >>>>> be >>> >>>>> >>> revealed >>> >>>>> >>>>> publicly, for privacy reasons. >>> >>>>> >>>>> >>> >>>>> >>>>> I believe that the person you will want to discuss your >>> >>>>> >>>>> request >>> >>>>> >>>>> with is >>> >>>>> >>>>> Toby, who I have cc'd here. >>> >>>>> >>>>> >>> >>>>> >>>>> Pine >>> >>>>> >>>>> >>> >>>>> >>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" >>> >>>>> >>>>> <[email protected]> >>> >>>>> >>> wrote: >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> Hi Wikimedia Analytics Team, >>> >>>>> >>>>> >>> >>>>> >>>>> My colleague Bharath and I are doing research on dynamic >>> >>>>> >>>>> server >>> >>>>> >>> allocation >>> >>>>> >>>>> algorithms and we were looking for a suitable datasets to >>> >>>>> >>>>> test >>> >>>>> >>>>> our >>> >>>>> >>>>> predictive algorithm on. We noticed that Wikimedia has an >>> >>>>> >>>>> amazing >>> >>>>> >>>>> data >>> >>>>> >>> set >>> >>>>> >>>>> of hourly page views, but we were looking for something a bit >>> >>>>> >>>>> more >>> >>>>> >>> granular, >>> >>>>> >>>>> such as aggregated page requests to English Wikipedia on a >>> >>>>> >>>>> minute >>> >>>>> >>>>> by >>> >>>>> >>> minute >>> >>>>> >>>>> basis or second by second basis if possible. >>> >>>>> >>>>> >>> >>>>> >>>>> We are more than happy to pour through any raw data you might >>> >>>>> >>>>> have that >>> >>>>> >>>>> would help us calculate page requests at this granular level. >>> >>>>> >>>>> Please >>> >>>>> >>> let us >>> >>>>> >>>>> know if it would be possible to get such data and if so how. >>> >>>>> >>>>> Thank you >>> >>>>> >>> in >>> >>>>> >>>>> advance for your help. >>> >>>>> >>>>> >>> >>>>> >>>>> Best, >>> >>>>> >>>>> >>> >>>>> >>>>> Hirav Gandhi >>> >>>>> >>>>> _______________________________________________ >>> >>>>> >>>>> Analytics mailing list >>> >>>>> >>>>> [email protected] >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> _______________________________________________ >>> >>>>> >>>>> Analytics mailing list >>> >>>>> >>>>> [email protected] >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> -- >>> >>>>> >>>>> Oliver Keyes >>> >>>>> >>>>> Research Analyst >>> >>>>> >>>>> Wikimedia Foundation >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> ------------------------------ >>> >>>>> >>>>> >>> >>>>> >>>>> _______________________________________________ >>> >>>>> >>>>> Analytics mailing list >>> >>>>> >>>>> [email protected] >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> End of Analytics Digest, Vol 38, Issue 21 >>> >>>>> >>>>> ***************************************** >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> _______________________________________________ >>> >>>>> >>>>> Analytics mailing list >>> >>>>> >>>>> [email protected] >>> >>>>> >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> -- >>> >>>>> >>>> Oliver Keyes >>> >>>>> >>>> Research Analyst >>> >>>>> >>>> Wikimedia Foundation >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> >>> -- >>> >>>>> >>> Oliver Keyes >>> >>>>> >>> Research Analyst >>> >>>>> >>> Wikimedia Foundation >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> >>> ------------------------------ >>> >>>>> >>> >>> >>>>> >>> _______________________________________________ >>> >>>>> >>> Analytics mailing list >>> >>>>> >>> [email protected] >>> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>> >>> >>>>> >> -------------- next part -------------- >>> >>>>> >> An HTML attachment was scrubbed... >>> >>>>> >> URL: >>> >>>>> >> >>> >>>>> >> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/3a5df491/attachment-0001.html> >>> >>>>> >> >>> >>>>> >> ------------------------------ >>> >>>>> >> >>> >>>>> >> Message: 3 >>> >>>>> >> Date: Mon, 13 Apr 2015 19:40:04 -0400 >>> >>>>> >> From: Oliver Keyes <[email protected]> >>> >>>>> >> To: "A mailing list for the Analytics Team at WMF and everybody >>> >>>>> >> who >>> >>>>> >> has an interest in Wikipedia and analytics." >>> >>>>> >> <[email protected]> >>> >>>>> >> Subject: Re: [Analytics] Page views on a more frequent than >>> >>>>> >> hourly >>> >>>>> >> basis >>> >>>>> >> Message-ID: >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> <caauqgdd6z5ussu11vw49fdmbsrhyejxku9yopyserib79j-...@mail.gmail.com> >>> >>>>> >> Content-Type: text/plain; charset=UTF-8 >>> >>>>> >> >>> >>>>> >> .... >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> ...years? >>> >>>>> >> >>> >>>>> >> We have unsampled logs for, ah. 2 months. >>> >>>>> >> >>> >>>>> >> On 13 April 2015 at 19:30, Hirav Gandhi <[email protected]> >>> >>>>> >> wrote: >>> >>>>> >>> Thanks Oliver! >>> >>>>> >>> >>> >>>>> >>> We would like this data for as broad of a time period as you >>> >>>>> >>> can >>> >>>>> >>> muster. The >>> >>>>> >>> more days, months and year represented in the dataset, the >>> >>>>> >>> better. >>> >>>>> >>> >>> >>>>> >>>> >>> >>>>> >>>> Okay, so: >>> >>>>> >>>> >>> >>>>> >>>> I took an hour from the pageviews logs,[0] and aggregated >>> >>>>> >>>> pageviews to >>> >>>>> >>>> enwiki (mobile and desktop both) by timestamp, down to >>> >>>>> >>>> one-second >>> >>>>> >>>> resolution levels. The lowest number of pageviews to enwiki >>> >>>>> >>>> per >>> >>>>> >>>> second >>> >>>>> >>>> was 2,981 >>> >>>>> >>>> >>> >>>>> >>>> So, I don't personally have a problem with generating a >>> >>>>> >>>> release >>> >>>>> >>>> of: >>> >>>>> >>>> >>> >>>>> >>>> 1. Pageviews per second; >>> >>>>> >>>> 2. To enwiki; >>> >>>>> >>>> 3. Over $TIME_PERIOD; >>> >>>>> >>>> 4. grouping the mobile and desktop site >>> >>>>> >>>> >>> >>>>> >>>> But Dario or someone should chip in before I touch anything ;p >>> >>>>> >>>> >>> >>>>> >>>> 6am yesterday. 6am because it should be low-traffic, right? At >>> >>>>> >>>> least >>> >>>>> >>>> given our biases towards north america and europe >>> >>>>> >>>> >>> >>>>> >>>> On 13 April 2015 at 11:54, Oliver Keyes <[email protected]> >>> >>>>> >>>> wrote: >>> >>>>> >>>>> Then that sounds much more viable. I'll run a quick test now >>> >>>>> >>>>> to >>> >>>>> >>>>> see >>> >>>>> >>>>> how much clustering we'd see at, say, the one-second >>> >>>>> >>>>> resolution >>> >>>>> >>>>> level, >>> >>>>> >>>>> and throw it out here so we can make more informed decisions >>> >>>>> >>>>> about a >>> >>>>> >>>>> data release on this. >>> >>>>> >>>>> >>> >>>>> >>>>> On 13 April 2015 at 08:08, Hirav Gandhi >>> >>>>> >>>>> <[email protected]> >>> >>>>> >>>>> wrote: >>> >>>>> >>>>>> Hi Oliver, >>> >>>>> >>>>>> >>> >>>>> >>>>>> Re: Hirav: would you be looking for temporally /and/ >>> >>>>> >>>>>> contextually >>> >>>>> >>>>>> granular >>> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just >>> >>>>> >>>>>> temporally >>> >>>>> >>>>>> granular, >>> >>>>> >>>>>> so "a view to a page on enwiki at X time"? If the latter >>> >>>>> >>>>>> you've >>> >>>>> >>>>>> got >>> >>>>> >>>>>> more of >>> >>>>> >>>>>> a shot, I suspect. >>> >>>>> >>>>>> >>> >>>>> >>>>>> I only want the latter - I am not concerned with the context >>> >>>>> >>>>>> so >>> >>>>> >>>>>> much as >>> >>>>> >>>>>> just >>> >>>>> >>>>>> “a view to a page on enwiki at X time.” >>> >>>>> >>>>>> >>> >>>>> >>>>>> Hirav >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> On Apr 13, 2015, at 5:00 AM, >>> >>>>> >>>>>> [email protected] >>> >>>>> >>>>>> wrote: >>> >>>>> >>>>>> >>> >>>>> >>>>>> Send Analytics mailing list submissions to >>> >>>>> >>>>>> [email protected] >>> >>>>> >>>>>> >>> >>>>> >>>>>> To subscribe or unsubscribe via the World Wide Web, visit >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>>> or, via email, send a message with subject or body 'help' to >>> >>>>> >>>>>> [email protected] >>> >>>>> >>>>>> >>> >>>>> >>>>>> You can reach the person managing the list at >>> >>>>> >>>>>> [email protected] >>> >>>>> >>>>>> >>> >>>>> >>>>>> When replying, please edit your Subject line so it is more >>> >>>>> >>>>>> specific >>> >>>>> >>>>>> than "Re: Contents of Analytics digest..." >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> Today's Topics: >>> >>>>> >>>>>> >>> >>>>> >>>>>> 1. Re: Page views on a more frequent than hourly basis >>> >>>>> >>>>>> (Pine W) >>> >>>>> >>>>>> 2. Re: Page views on a more frequent than hourly basis >>> >>>>> >>>>>> (Oliver >>> >>>>> >>>>>> Keyes) >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> ---------------------------------------------------------------------- >>> >>>>> >>>>>> >>> >>>>> >>>>>> Message: 1 >>> >>>>> >>>>>> Date: Mon, 13 Apr 2015 00:47:31 -0700 >>> >>>>> >>>>>> From: Pine W <[email protected]> >>> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and >>> >>>>> >>>>>> everybody >>> >>>>> >>>>>> who >>> >>>>> >>>>>> has an interest in Wikipedia and analytics." >>> >>>>> >>>>>> <[email protected]> >>> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]> >>> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than >>> >>>>> >>>>>> hourly >>> >>>>> >>>>>> basis >>> >>>>> >>>>>> Message-ID: >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> <CAF=dyjgnut+t6n6mujq16duyiwp7et6ruht3_-tzdnsep+2...@mail.gmail.com> >>> >>>>> >>>>>> Content-Type: text/plain; charset="utf-8" >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> Hi, >>> >>>>> >>>>>> >>> >>>>> >>>>>> This issue of pageview data granularity has been discussed >>> >>>>> >>>>>> before, and >>> >>>>> >>>>>> the >>> >>>>> >>>>>> answer has been that hourly is the smallest increment >>> >>>>> >>>>>> allowed to >>> >>>>> >>>>>> be >>> >>>>> >>>>>> revealed publicly, for privacy reasons. >>> >>>>> >>>>>> >>> >>>>> >>>>>> I believe that the person you will want to discuss your >>> >>>>> >>>>>> request >>> >>>>> >>>>>> with is >>> >>>>> >>>>>> Toby, who I have cc'd here. >>> >>>>> >>>>>> >>> >>>>> >>>>>> Pine >>> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" >>> >>>>> >>>>>> <[email protected]> >>> >>>>> >>>>>> wrote: >>> >>>>> >>>>>> >>> >>>>> >>>>>> Hi Wikimedia Analytics Team, >>> >>>>> >>>>>> >>> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic >>> >>>>> >>>>>> server >>> >>>>> >>>>>> allocation >>> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to >>> >>>>> >>>>>> test >>> >>>>> >>>>>> our >>> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an >>> >>>>> >>>>>> amazing data >>> >>>>> >>>>>> set >>> >>>>> >>>>>> of hourly page views, but we were looking for something a >>> >>>>> >>>>>> bit >>> >>>>> >>>>>> more >>> >>>>> >>>>>> granular, such as aggregated page requests to English >>> >>>>> >>>>>> Wikipedia >>> >>>>> >>>>>> on a >>> >>>>> >>>>>> minute >>> >>>>> >>>>>> by minute basis or second by second basis if possible. >>> >>>>> >>>>>> >>> >>>>> >>>>>> We are more than happy to pour through any raw data you >>> >>>>> >>>>>> might >>> >>>>> >>>>>> have that >>> >>>>> >>>>>> would help us calculate page requests at this granular >>> >>>>> >>>>>> level. >>> >>>>> >>>>>> Please >>> >>>>> >>>>>> let us >>> >>>>> >>>>>> know if it would be possible to get such data and if so how. >>> >>>>> >>>>>> Thank you >>> >>>>> >>>>>> in >>> >>>>> >>>>>> advance for your help. >>> >>>>> >>>>>> >>> >>>>> >>>>>> Best, >>> >>>>> >>>>>> >>> >>>>> >>>>>> Hirav Gandhi >>> >>>>> >>>>>> _______________________________________________ >>> >>>>> >>>>>> Analytics mailing list >>> >>>>> >>>>>> [email protected] >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>>> >>> >>>>> >>>>>> -------------- next part -------------- >>> >>>>> >>>>>> An HTML attachment was scrubbed... >>> >>>>> >>>>>> URL: >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> <https://lists.wikimedia.org/pipermail/analytics/attachments/20150413/a88287b6/attachment-0001.html> >>> >>>>> >>>>>> >>> >>>>> >>>>>> ------------------------------ >>> >>>>> >>>>>> >>> >>>>> >>>>>> Message: 2 >>> >>>>> >>>>>> Date: Mon, 13 Apr 2015 06:39:45 -0400 >>> >>>>> >>>>>> From: Oliver Keyes <[email protected]> >>> >>>>> >>>>>> To: "A mailing list for the Analytics Team at WMF and >>> >>>>> >>>>>> everybody >>> >>>>> >>>>>> who >>> >>>>> >>>>>> has an interest in Wikipedia and analytics." >>> >>>>> >>>>>> <[email protected]> >>> >>>>> >>>>>> Cc: Bharath Sitaraman <[email protected]> >>> >>>>> >>>>>> Subject: Re: [Analytics] Page views on a more frequent than >>> >>>>> >>>>>> hourly >>> >>>>> >>>>>> basis >>> >>>>> >>>>>> Message-ID: >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> <CAAUQgdDsnHd8s+ACL-XBtXBz6OO-T04CcJfnGfqwrYAV-=h...@mail.gmail.com> >>> >>>>> >>>>>> Content-Type: text/plain; charset=UTF-8 >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> Preeetty sure that Toby is on the analytics list, Pine. He's >>> >>>>> >>>>>> the >>> >>>>> >>>>>> director of analytics. >>> >>>>> >>>>>> >>> >>>>> >>>>>> Hirav: would you be looking for temporally /and/ >>> >>>>> >>>>>> contextually >>> >>>>> >>>>>> granular >>> >>>>> >>>>>> pageviews, i.e. "a view to X page at Y time", or just >>> >>>>> >>>>>> temporally >>> >>>>> >>>>>> granular, so "a view to a page on enwiki at X time"? If the >>> >>>>> >>>>>> latter >>> >>>>> >>>>>> you've got more of a shot, I suspect. >>> >>>>> >>>>>> >>> >>>>> >>>>>> On 13 April 2015 at 03:47, Pine W <[email protected]> >>> >>>>> >>>>>> wrote: >>> >>>>> >>>>>> >>> >>>>> >>>>>> Hi, >>> >>>>> >>>>>> >>> >>>>> >>>>>> This issue of pageview data granularity has been discussed >>> >>>>> >>>>>> before, and >>> >>>>> >>>>>> the >>> >>>>> >>>>>> answer has been that hourly is the smallest increment >>> >>>>> >>>>>> allowed to >>> >>>>> >>>>>> be >>> >>>>> >>>>>> revealed >>> >>>>> >>>>>> publicly, for privacy reasons. >>> >>>>> >>>>>> >>> >>>>> >>>>>> I believe that the person you will want to discuss your >>> >>>>> >>>>>> request >>> >>>>> >>>>>> with is >>> >>>>> >>>>>> Toby, who I have cc'd here. >>> >>>>> >>>>>> >>> >>>>> >>>>>> Pine >>> >>>>> >>>>>> >>> >>>>> >>>>>> On Apr 13, 2015 12:11 AM, "Hirav Gandhi" >>> >>>>> >>>>>> <[email protected]> >>> >>>>> >>>>>> wrote: >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> Hi Wikimedia Analytics Team, >>> >>>>> >>>>>> >>> >>>>> >>>>>> My colleague Bharath and I are doing research on dynamic >>> >>>>> >>>>>> server >>> >>>>> >>>>>> allocation >>> >>>>> >>>>>> algorithms and we were looking for a suitable datasets to >>> >>>>> >>>>>> test >>> >>>>> >>>>>> our >>> >>>>> >>>>>> predictive algorithm on. We noticed that Wikimedia has an >>> >>>>> >>>>>> amazing data >>> >>>>> >>>>>> set >>> >>>>> >>>>>> of hourly page views, but we were looking for something a >>> >>>>> >>>>>> bit >>> >>>>> >>>>>> more >>> >>>>> >>>>>> granular, >>> >>>>> >>>>>> such as aggregated page requests to English Wikipedia on a >>> >>>>> >>>>>> minute by >>> >>>>> >>>>>> minute >>> >>>>> >>>>>> basis or second by second basis if possible. >>> >>>>> >>>>>> >>> >>>>> >>>>>> We are more than happy to pour through any raw data you >>> >>>>> >>>>>> might >>> >>>>> >>>>>> have that >>> >>>>> >>>>>> would help us calculate page requests at this granular >>> >>>>> >>>>>> level. >>> >>>>> >>>>>> Please >>> >>>>> >>>>>> let us >>> >>>>> >>>>>> know if it would be possible to get such data and if so how. >>> >>>>> >>>>>> Thank you >>> >>>>> >>>>>> in >>> >>>>> >>>>>> advance for your help. >>> >>>>> >>>>>> >>> >>>>> >>>>>> Best, >>> >>>>> >>>>>> >>> >>>>> >>>>>> Hirav Gandhi >>> >>>>> >>>>>> _______________________________________________ >>> >>>>> >>>>>> Analytics mailing list >>> >>>>> >>>>>> [email protected] >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> _______________________________________________ >>> >>>>> >>>>>> Analytics mailing list >>> >>>>> >>>>>> [email protected] >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> -- >>> >>>>> >>>>>> Oliver Keyes >>> >>>>> >>>>>> Research Analyst >>> >>>>> >>>>>> Wikimedia Foundation >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> ------------------------------ >>> >>>>> >>>>>> >>> >>>>> >>>>>> _______________________________________________ >>> >>>>> >>>>>> Analytics mailing list >>> >>>>> >>>>>> [email protected] >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> End of Analytics Digest, Vol 38, Issue 21 >>> >>>>> >>>>>> ***************************************** >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> >>> >>>>> >>>>>> _______________________________________________ >>> >>>>> >>>>>> Analytics mailing list >>> >>>>> >>>>>> [email protected] >>> >>>>> >>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> >>> >>>>> >>>>> -- >>> >>>>> >>>>> Oliver Keyes >>> >>>>> >>>>> Research Analyst >>> >>>>> >>>>> Wikimedia Foundation >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> -- >>> >>>>> >>>> Oliver Keyes >>> >>>>> >>>> Research Analyst >>> >>>>> >>>> Wikimedia Foundation >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> >>> >>>>> >>>> ------------------------------ >>> >>>>> >>>> >>> >>>>> >>>> _______________________________________________ >>> >>>>> >>>> Analytics mailing list >>> >>>>> >>>> [email protected] >>> >>>>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>> >>> >>>>> >>> >>> >>>>> >>> _______________________________________________ >>> >>>>> >>> Analytics mailing list >>> >>>>> >>> [email protected] >>> >>>>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> -- >>> >>>>> >> Oliver Keyes >>> >>>>> >> Research Analyst >>> >>>>> >> Wikimedia Foundation >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> ------------------------------ >>> >>>>> >> >>> >>>>> >> _______________________________________________ >>> >>>>> >> Analytics mailing list >>> >>>>> >> [email protected] >>> >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >> >>> >>>>> >> >>> >>>>> >> End of Analytics Digest, Vol 38, Issue 24 >>> >>>>> >> ***************************************** >>> >>>>> > >>> >>>>> > >>> >>>>> > _______________________________________________ >>> >>>>> > Analytics mailing list >>> >>>>> > [email protected] >>> >>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> Oliver Keyes >>> >>>>> Research Analyst >>> >>>>> Wikimedia Foundation >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Dario Taraborelli >>> >>>> Senior Research Scientist, Research and Data Lead >>> >>>> Wikimedia Foundation >>> >>>> http://wikimediafoundation.org >>> >>>> http://nitens.org/taraborelli >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Dario Taraborelli >>> >>> Senior Research Scientist, Research and Data Lead >>> >>> Wikimedia Foundation >>> >>> http://wikimediafoundation.org >>> >>> http://nitens.org/taraborelli >>> >> >>> >> >>> > >>> >>> >>> >>> -- >>> Oliver Keyes >>> Research Analyst >>> Wikimedia Foundation >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> >> -- >> Dario Taraborelli >> Senior Research Scientist, Research and Data Lead >> Wikimedia Foundation >> http://wikimediafoundation.org >> http://nitens.org/taraborelli > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
