A big +1 to Erik. As he says, clients can switch over; let's deprecate the old format and files. We can keep them around (and there is plenty of code in various languages for _reading_ that format) but there's no need to be restricted by it.
On 24 December 2015 at 09:41, Erik Zachte <[email protected]> wrote: > Happy Holidays indeed, everyone! > > Let's celebrate an eventful year with lots of progress on the Analytics > front. But also open issues waiting to be addressed asap in the next year. > > My personal priority is to get the geographical reports back up running, now > that Dan implemented a new geo data feed using hive data, earlier this > month. Thanks again, Dan! > > > > From: Analytics [mailto:[email protected]] On Behalf Of > Dan Andreescu > Sent: Thursday, December 24, 2015 15:25 > > > To: A mailing list for the Analytics Team at WMF and everybody who has an > interest in Wikipedia and analytics. > Subject: Re: [Analytics] [Pageviews] [Technical] Simplifying the available > static dumps of pageview data > > > > Apologies! I realized it was Christmas Eve but I by no means meant to rush > this conversation. Take as long as you like to answer to the thread and > enjoy your holidays everyone :) I'll poke the thread again after the New > Year. Happy Holidays! > > > > On Thu, Dec 24, 2015 at 9:21 AM, Erik Zachte <[email protected]> wrote: > > Dan, thanks for raising the issue (a bit less for raising it on X-mas eve > ;-) (just kidding, mostly) > > > > Frankly I don't see much use for the earlier releases at all. The newest > version had been kept very much downward compatible, migration of clients > should be a no-brainer (mostly switching download url). Upgrading those same > clients to also use the new additional counts is bit more work as the coding > scheme is tedious (as a result of that downward compatability). But that > upgrading could be done later. > > > > I propose to deprecate both earlier sets, and set an end date for updating > those, e.g. July 1, and publish that widely, and offer support with > migration. If people feel otherwise please chime in. Keeping the existing > files is another matter, we should do so of course. > > > > About my aggregation datasets, it's just that: an aggregation of hourly > files into daily and monthly aggregates, with extreme compression while > retaining hourly precision, and adjusting for missing data (by > extrapolation). These files are ideal for batch processes and lean > downloads, and archiving for the longer haul. > > > > Reworking the datasets, in whatever way, with categories as part of the > scheme sounds like a major overhaul, not like cleaning up old stuff. > Exciting, but best to be done under a separate flag. > > > > Cheers, > > Erik > > > > > > > > From: Analytics [mailto:[email protected]] On Behalf Of > Maurice Vergeer > Sent: Thursday, December 24, 2015 15:12 > To: A mailing list for the Analytics Team at WMF and everybody who has an > interest in Wikipedia and analytics. > Subject: Re: [Analytics] [Pageviews] [Technical] Simplifying the available > static dumps of pageview data > > > > Dear all, > > As I just mentioned to Dan in a private email conversation, keeping datasets > even with imperfect measurements is important. Particularly for longitudinal > analysis. > > Also, from what I understand - me being a newby here - is that the data are > stored in separate files. Dan suggested reordering the page into categories. > Maybe, another option is to create more extensive datasets with more > different measurements in a single datafile. On the other hand, the files > would become even bigger in size. Not an issue for mee, but for users in the > field accesibility (dowlnload bandwidth) could become an issue. > > my two cents > > Maurice > > > > > > On Thu, Dec 24, 2015 at 2:58 PM, Alex Druk <[email protected]> wrote: > > Nothing against this approach! > > > > On Thu, Dec 24, 2015 at 2:55 PM, Dan Andreescu <[email protected]> > wrote: > > > > > > On Thu, Dec 24, 2015 at 8:48 AM, Alex Druk <[email protected]> wrote: > > Hi Dan, > > Happy holidays! > > Good idea to combine these datasets! However we have one more dataset by > Erik Zachte : http://dumps.wikimedia.org/other/pagecounts-ez/ > > > > And that's an important one! But I was thinking we could re-organize the > page into categories. Erik's dataset could go into a "processed data" > category or something like that. The three I wanted to talk about on this > thread are just the raw data. > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > > Thank you. > > Alex Druk > [email protected] > (775) 237-8550 Google voice > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > -- > > ________________________________________________ > Maurice Vergeer > To contact me, see http://mauricevergeer.nl/node/5 > To see my publications, see http://mauricevergeer.nl/node/1 > ________________________________________________ > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Count Logula Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
