+1 For example, the last time I sent a similar email to the list, it was for the wiki_info table. One of the tasks I have is to break the code for generating that table out of the analysis project it lives in and make it a separate repo so that Oliver can send pull requests to fix issues and/or maintain his own managed table.
It would be great to work towards an architecture that allows us to keep these tables up-to-date without user-based cron jobs. -Aaron On Thu, Jun 12, 2014 at 4:24 PM, Dan Andreescu <[email protected]> wrote: > This is great. I'd like to go on record saying that this is leaning > towards a data warehouse kind of approach - basically pre-aggregating > useful datasets. So we might want to do this in a more organized way down > the line. > > > On Thu, Jun 12, 2014 at 2:57 PM, Oliver Keyes <[email protected]> > wrote: > >> This is fricking awesome! >> >> >> On 12 June 2014 10:58, Aaron Halfaker <[email protected]> wrote: >> >>> I created a new table on analytics-store.eqiad.wmnet. It contains the >>> monthly edit counts for all wikis. See a brief overview below. >>> >>> Note that the "revisions" column contains a count of all revisions -- >>> archived or not. The "archived" column contains a count of archived >>> revisions. So revisions - archived == non-archived revisions. >>> >>> analytics-store.eqiad.wmnet [staging]> explain editor_month; >>> +-------------------+----------------+------+-----+---------+-------+ >>> | Field | Type | Null | Key | Default | Extra | >>> +-------------------+----------------+------+-----+---------+-------+ >>> | wiki | varbinary(50) | NO | PRI | | | >>> | month | varbinary(7) | NO | PRI | | | >>> | user_id | int(11) | NO | PRI | 0 | | >>> | user_name | varbinary(191) | YES | | NULL | | >>> | user_registration | varbinary(14) | YES | | NULL | | >>> | archived | int(11) | YES | | NULL | | >>> | revisions | int(11) | YES | | NULL | | >>> +-------------------+----------------+------+-----+---------+-------+ >>> 7 rows in set (0.01 sec) >>> >>> analytics-store.eqiad.wmnet [staging]> select * from editor_month limit >>> 3; >>> >>> +--------+---------+---------+------------+-------------------+----------+-----------+ >>> | wiki | month | user_id | user_name | user_registration | archived >>> | revisions | >>> >>> +--------+---------+---------+------------+-------------------+----------+-----------+ >>> | enwiki | 2001-01 | 34 | WojPob | 20010129110725 | 0 >>> | 13 | >>> | enwiki | 2001-01 | 99 | RoseParks | 20010121021221 | 0 >>> | 7 | >>> | enwiki | 2001-01 | 479 | JimboWales | 20010123223416 | 0 >>> | 13 | >>> >>> +--------+---------+---------+------------+-------------------+----------+-----------+ >>> 3 rows in set (0.03 sec) >>> >>> Feedback is welcome. One of the next things, I'd like to do is remove >>> the "-" from the month column as it ruins comparison with MW timestamps. >>> >>> -Aaron >>> >>> _______________________________________________ >>> wmfresearch mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wmfresearch >>> >>> >> >> >> -- >> Oliver Keyes >> Research Analyst >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
