Interesting! Is there a way that I can use this with metrics.wikimedia.org
to perform cross-wiki cohort analysis, or do I need access to
analytics-store.eqiad.wmnet?

Pine


On Thu, Jun 12, 2014 at 5:50 PM, Aaron Halfaker <[email protected]>
wrote:

> The only reason I didn't break this down by namespace was because they
> queries would have taken an order of magnitude longer to join the revision
> and page tables.  The query I used didn't even need to read the revision or
> archive tables.  It only read an index on those tables.  That made it go
> pretty fast.  :)  I'd be interested in taking another pass if you guys
> don't mind dealing with a heavier server load.
>
> On Thu, Jun 12, 2014 at 7:05 PM, Dario Taraborelli <
> [email protected]> wrote:
>
>> Aaron – this is fantastic.
>> Two quick questions:
>>
>> - was the decision not to break down the data by namespace (matching Erik
>> Zachte’s master editor data dump) intentional?
>> - are we expecting to refresh the archived revision count field every
>> month?
>>
>> Dario
>>
>>
>> On Jun 12, 2014, at 2:33 PM, Aaron Halfaker <[email protected]>
>> wrote:
>>
>> +1
>>
>> For example, the last time I sent a similar email to the list, it was for
>> the wiki_info table.  One of the tasks I have is to break the code for
>> generating that table out of the analysis project it lives in and make it a
>> separate repo so that Oliver can send pull requests to fix issues and/or
>> maintain his own managed table.
>>
>> It would be great to work towards an architecture that allows us to keep
>> these tables up-to-date without user-based cron jobs.
>>
>> -Aaron
>>
>> On Thu, Jun 12, 2014 at 4:24 PM, Dan Andreescu <[email protected]>
>> wrote:
>>
>>> This is great.  I'd like to go on record saying that this is leaning
>>> towards a data warehouse kind of approach - basically pre-aggregating
>>> useful datasets.  So we might want to do this in a more organized way down
>>> the line.
>>>
>>>
>>> On Thu, Jun 12, 2014 at 2:57 PM, Oliver Keyes <[email protected]>
>>> wrote:
>>>
>>>> This is fricking awesome!
>>>>
>>>>
>>>>  On 12 June 2014 10:58, Aaron Halfaker <[email protected]> wrote:
>>>>
>>>>> I created a new table on analytics-store.eqiad.wmnet.  It contains the
>>>>> monthly edit counts for all wikis.  See a brief overview below.
>>>>>
>>>>> Note that the "revisions" column contains a count of all revisions --
>>>>> archived or not.  The "archived" column contains a count of archived
>>>>> revisions.   So revisions - archived == non-archived revisions.
>>>>>
>>>>> analytics-store.eqiad.wmnet [staging]> explain editor_month;
>>>>> +-------------------+----------------+------+-----+---------+-------+
>>>>> | Field             | Type           | Null | Key | Default | Extra |
>>>>> +-------------------+----------------+------+-----+---------+-------+
>>>>> | wiki              | varbinary(50)  | NO   | PRI |         |       |
>>>>> | month             | varbinary(7)   | NO   | PRI |         |       |
>>>>> | user_id           | int(11)        | NO   | PRI | 0       |       |
>>>>> | user_name         | varbinary(191) | YES  |     | NULL    |       |
>>>>> | user_registration | varbinary(14)  | YES  |     | NULL    |       |
>>>>> | archived          | int(11)        | YES  |     | NULL    |       |
>>>>> | revisions         | int(11)        | YES  |     | NULL    |       |
>>>>> +-------------------+----------------+------+-----+---------+-------+
>>>>> 7 rows in set (0.01 sec)
>>>>>
>>>>> analytics-store.eqiad.wmnet [staging]> select * from editor_month
>>>>> limit 3;
>>>>>
>>>>> +--------+---------+---------+------------+-------------------+----------+-----------+
>>>>> | wiki   | month   | user_id | user_name  | user_registration |
>>>>> archived | revisions |
>>>>>
>>>>> +--------+---------+---------+------------+-------------------+----------+-----------+
>>>>> | enwiki | 2001-01 |      34 | WojPob     | 20010129110725    |
>>>>>  0 |        13 |
>>>>> | enwiki | 2001-01 |      99 | RoseParks  | 20010121021221    |
>>>>>  0 |         7 |
>>>>> | enwiki | 2001-01 |     479 | JimboWales | 20010123223416    |
>>>>>  0 |        13 |
>>>>>
>>>>> +--------+---------+---------+------------+-------------------+----------+-----------+
>>>>> 3 rows in set (0.03 sec)
>>>>>
>>>>> Feedback is welcome.   One of the next things, I'd like to do is
>>>>> remove the "-" from the month column as it ruins comparison with MW
>>>>> timestamps.
>>>>>
>>>>> -Aaron
>>>>>
>>>>> _______________________________________________
>>>>> wmfresearch mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wmfresearch
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Oliver Keyes
>>>> Research Analyst
>>>> Wikimedia Foundation
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to