A big +1 to Erik. As he says, clients can switch over; let's deprecate
the old format and files. We can keep them around (and there is plenty
of code in various languages for _reading_ that format) but there's no
need to be restricted by it.

On 24 December 2015 at 09:41, Erik Zachte <[email protected]> wrote:
> Happy Holidays indeed, everyone!
>
> Let's celebrate an eventful year with lots of progress on the Analytics
> front. But also open issues waiting to be addressed asap in the next year.
>
> My personal priority is to get the geographical reports back up running, now
> that Dan implemented a new geo data feed using hive data, earlier this
> month. Thanks again, Dan!
>
>
>
> From: Analytics [mailto:[email protected]] On Behalf Of
> Dan Andreescu
> Sent: Thursday, December 24, 2015 15:25
>
>
> To: A mailing list for the Analytics Team at WMF and everybody who has an
> interest in Wikipedia and analytics.
> Subject: Re: [Analytics] [Pageviews] [Technical] Simplifying the available
> static dumps of pageview data
>
>
>
> Apologies!  I realized it was Christmas Eve but I by no means meant to rush
> this conversation.  Take as long as you like to answer to the thread and
> enjoy your holidays everyone :)  I'll poke the thread again after the New
> Year.  Happy Holidays!
>
>
>
> On Thu, Dec 24, 2015 at 9:21 AM, Erik Zachte <[email protected]> wrote:
>
> Dan, thanks for raising the issue (a bit less for raising it on X-mas eve
> ;-) (just kidding, mostly)
>
>
>
> Frankly I don't see much use for the earlier releases at all. The newest
> version had been kept very much downward compatible, migration of clients
> should be a no-brainer (mostly switching download url). Upgrading those same
> clients to also use the new additional counts is bit more work as the coding
> scheme is tedious (as a result of that downward compatability). But that
> upgrading could be done later.
>
>
>
> I propose to deprecate both earlier sets, and set an end date for updating
> those, e.g. July 1, and publish that widely, and offer support with
> migration. If people feel otherwise please chime in. Keeping the existing
> files is another matter, we should do so of course.
>
>
>
> About my aggregation datasets, it's just that: an aggregation of hourly
> files into daily and monthly aggregates, with extreme compression while
> retaining hourly precision, and adjusting for missing data (by
> extrapolation). These files are ideal for batch processes and lean
> downloads, and archiving for the longer haul.
>
>
>
> Reworking the datasets, in whatever way, with categories as part of the
> scheme sounds like a major overhaul, not like cleaning up old stuff.
> Exciting, but best to be done under a separate flag.
>
>
>
> Cheers,
>
> Erik
>
>
>
>
>
>
>
> From: Analytics [mailto:[email protected]] On Behalf Of
> Maurice Vergeer
> Sent: Thursday, December 24, 2015 15:12
> To: A mailing list for the Analytics Team at WMF and everybody who has an
> interest in Wikipedia and analytics.
> Subject: Re: [Analytics] [Pageviews] [Technical] Simplifying the available
> static dumps of pageview data
>
>
>
> Dear all,
>
> As I just mentioned to Dan in a private email conversation, keeping datasets
> even with imperfect measurements is important. Particularly for longitudinal
> analysis.
>
> Also, from what I understand - me being a newby here - is that the data are
> stored in separate files. Dan suggested reordering the page into categories.
> Maybe, another option is to create more extensive datasets with more
> different measurements in a single datafile. On the other hand, the files
> would become even bigger in size. Not an issue for mee, but for users in the
> field accesibility (dowlnload bandwidth) could become an issue.
>
> my two cents
>
> Maurice
>
>
>
>
>
> On Thu, Dec 24, 2015 at 2:58 PM, Alex Druk <[email protected]> wrote:
>
> Nothing against this approach!
>
>
>
> On Thu, Dec 24, 2015 at 2:55 PM, Dan Andreescu <[email protected]>
> wrote:
>
>
>
>
>
> On Thu, Dec 24, 2015 at 8:48 AM, Alex Druk <[email protected]> wrote:
>
> Hi Dan,
>
> Happy holidays!
>
> Good idea to combine these datasets! However we have one more dataset by
> Erik Zachte : http://dumps.wikimedia.org/other/pagecounts-ez/
>
>
>
> And that's an important one!  But I was thinking we could re-organize the
> page into categories.  Erik's dataset could go into a "processed data"
> category or something like that.  The three I wanted to talk about on this
> thread are just the raw data.
>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
>
> --
>
> Thank you.
>
> Alex Druk
> [email protected]
> (775) 237-8550 Google voice
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> --
>
> ________________________________________________
> Maurice Vergeer
> To contact me, see http://mauricevergeer.nl/node/5
> To see my publications, see http://mauricevergeer.nl/node/1
> ________________________________________________
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to