Aaron – this is fantastic.
Two quick questions:

- was the decision not to break down the data by namespace (matching Erik 
Zachte’s master editor data dump) intentional?
- are we expecting to refresh the archived revision count field every month?

Dario

On Jun 12, 2014, at 2:33 PM, Aaron Halfaker <[email protected]> wrote:

> +1
> 
> For example, the last time I sent a similar email to the list, it was for the 
> wiki_info table.  One of the tasks I have is to break the code for generating 
> that table out of the analysis project it lives in and make it a separate 
> repo so that Oliver can send pull requests to fix issues and/or maintain his 
> own managed table.
> 
> It would be great to work towards an architecture that allows us to keep 
> these tables up-to-date without user-based cron jobs.  
> 
> -Aaron
> 
> On Thu, Jun 12, 2014 at 4:24 PM, Dan Andreescu <[email protected]> 
> wrote:
> This is great.  I'd like to go on record saying that this is leaning towards 
> a data warehouse kind of approach - basically pre-aggregating useful 
> datasets.  So we might want to do this in a more organized way down the line.
> 
> 
> On Thu, Jun 12, 2014 at 2:57 PM, Oliver Keyes <[email protected]> wrote:
> This is fricking awesome!
> 
> 
> On 12 June 2014 10:58, Aaron Halfaker <[email protected]> wrote:
> I created a new table on analytics-store.eqiad.wmnet.  It contains the 
> monthly edit counts for all wikis.  See a brief overview below.    
> 
> Note that the "revisions" column contains a count of all revisions -- 
> archived or not.  The "archived" column contains a count of archived 
> revisions.   So revisions - archived == non-archived revisions. 
> 
> analytics-store.eqiad.wmnet [staging]> explain editor_month;
> +-------------------+----------------+------+-----+---------+-------+
> | Field             | Type           | Null | Key | Default | Extra |
> +-------------------+----------------+------+-----+---------+-------+
> | wiki              | varbinary(50)  | NO   | PRI |         |       |
> | month             | varbinary(7)   | NO   | PRI |         |       |
> | user_id           | int(11)        | NO   | PRI | 0       |       |
> | user_name         | varbinary(191) | YES  |     | NULL    |       |
> | user_registration | varbinary(14)  | YES  |     | NULL    |       |
> | archived          | int(11)        | YES  |     | NULL    |       |
> | revisions         | int(11)        | YES  |     | NULL    |       |
> +-------------------+----------------+------+-----+---------+-------+
> 7 rows in set (0.01 sec)
> 
> analytics-store.eqiad.wmnet [staging]> select * from editor_month limit 3;
> +--------+---------+---------+------------+-------------------+----------+-----------+
> | wiki   | month   | user_id | user_name  | user_registration | archived | 
> revisions |
> +--------+---------+---------+------------+-------------------+----------+-----------+
> | enwiki | 2001-01 |      34 | WojPob     | 20010129110725    |        0 |    
>     13 |
> | enwiki | 2001-01 |      99 | RoseParks  | 20010121021221    |        0 |    
>      7 |
> | enwiki | 2001-01 |     479 | JimboWales | 20010123223416    |        0 |    
>     13 |
> +--------+---------+---------+------------+-------------------+----------+-----------+
> 3 rows in set (0.03 sec)
> 
> Feedback is welcome.   One of the next things, I'd like to do is remove the 
> "-" from the month column as it ruins comparison with MW timestamps. 
> 
> -Aaron
> 
> _______________________________________________
> wmfresearch mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wmfresearch
> 
> 
> 
> 
> -- 
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to