Interesting! Is there a way that I can use this with metrics.wikimedia.org to perform cross-wiki cohort analysis, or do I need access to analytics-store.eqiad.wmnet?
Pine On Thu, Jun 12, 2014 at 5:50 PM, Aaron Halfaker <[email protected]> wrote: > The only reason I didn't break this down by namespace was because they > queries would have taken an order of magnitude longer to join the revision > and page tables. The query I used didn't even need to read the revision or > archive tables. It only read an index on those tables. That made it go > pretty fast. :) I'd be interested in taking another pass if you guys > don't mind dealing with a heavier server load. > > On Thu, Jun 12, 2014 at 7:05 PM, Dario Taraborelli < > [email protected]> wrote: > >> Aaron – this is fantastic. >> Two quick questions: >> >> - was the decision not to break down the data by namespace (matching Erik >> Zachte’s master editor data dump) intentional? >> - are we expecting to refresh the archived revision count field every >> month? >> >> Dario >> >> >> On Jun 12, 2014, at 2:33 PM, Aaron Halfaker <[email protected]> >> wrote: >> >> +1 >> >> For example, the last time I sent a similar email to the list, it was for >> the wiki_info table. One of the tasks I have is to break the code for >> generating that table out of the analysis project it lives in and make it a >> separate repo so that Oliver can send pull requests to fix issues and/or >> maintain his own managed table. >> >> It would be great to work towards an architecture that allows us to keep >> these tables up-to-date without user-based cron jobs. >> >> -Aaron >> >> On Thu, Jun 12, 2014 at 4:24 PM, Dan Andreescu <[email protected]> >> wrote: >> >>> This is great. I'd like to go on record saying that this is leaning >>> towards a data warehouse kind of approach - basically pre-aggregating >>> useful datasets. So we might want to do this in a more organized way down >>> the line. >>> >>> >>> On Thu, Jun 12, 2014 at 2:57 PM, Oliver Keyes <[email protected]> >>> wrote: >>> >>>> This is fricking awesome! >>>> >>>> >>>> On 12 June 2014 10:58, Aaron Halfaker <[email protected]> wrote: >>>> >>>>> I created a new table on analytics-store.eqiad.wmnet. It contains the >>>>> monthly edit counts for all wikis. See a brief overview below. >>>>> >>>>> Note that the "revisions" column contains a count of all revisions -- >>>>> archived or not. The "archived" column contains a count of archived >>>>> revisions. So revisions - archived == non-archived revisions. >>>>> >>>>> analytics-store.eqiad.wmnet [staging]> explain editor_month; >>>>> +-------------------+----------------+------+-----+---------+-------+ >>>>> | Field | Type | Null | Key | Default | Extra | >>>>> +-------------------+----------------+------+-----+---------+-------+ >>>>> | wiki | varbinary(50) | NO | PRI | | | >>>>> | month | varbinary(7) | NO | PRI | | | >>>>> | user_id | int(11) | NO | PRI | 0 | | >>>>> | user_name | varbinary(191) | YES | | NULL | | >>>>> | user_registration | varbinary(14) | YES | | NULL | | >>>>> | archived | int(11) | YES | | NULL | | >>>>> | revisions | int(11) | YES | | NULL | | >>>>> +-------------------+----------------+------+-----+---------+-------+ >>>>> 7 rows in set (0.01 sec) >>>>> >>>>> analytics-store.eqiad.wmnet [staging]> select * from editor_month >>>>> limit 3; >>>>> >>>>> +--------+---------+---------+------------+-------------------+----------+-----------+ >>>>> | wiki | month | user_id | user_name | user_registration | >>>>> archived | revisions | >>>>> >>>>> +--------+---------+---------+------------+-------------------+----------+-----------+ >>>>> | enwiki | 2001-01 | 34 | WojPob | 20010129110725 | >>>>> 0 | 13 | >>>>> | enwiki | 2001-01 | 99 | RoseParks | 20010121021221 | >>>>> 0 | 7 | >>>>> | enwiki | 2001-01 | 479 | JimboWales | 20010123223416 | >>>>> 0 | 13 | >>>>> >>>>> +--------+---------+---------+------------+-------------------+----------+-----------+ >>>>> 3 rows in set (0.03 sec) >>>>> >>>>> Feedback is welcome. One of the next things, I'd like to do is >>>>> remove the "-" from the month column as it ruins comparison with MW >>>>> timestamps. >>>>> >>>>> -Aaron >>>>> >>>>> _______________________________________________ >>>>> wmfresearch mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/wmfresearch >>>>> >>>>> >>>> >>>> >>>> -- >>>> Oliver Keyes >>>> Research Analyst >>>> Wikimedia Foundation >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
