It turns out that I did do some pre-computing here.  See
db1047.eqiad.wmnet:staging.editor_month_by_namespace

[staging]> explain editor_month_by_namespace;
+----------------+--------------+------+-----+---------+-------+
| Field          | Type         | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| wiki           | varchar(50)  | NO   | PRI |         |       |
| month          | varbinary(7) | NO   | PRI |         |       |
| user_id        | varchar(255) | NO   | PRI |         |       |
| page_namespace | int(11)      | NO   | PRI | 0       |       |
| archived       | int(11)      | YES  |     | NULL    |       |
| revisions      | int(11)      | YES  |     | NULL    |       |
| mmonth         | date         | YES  |     | NULL    |       |
| reverted       | int(11)      | YES  |     | NULL    |       |
+----------------+--------------+------+-----+---------+-------+
8 rows in set (0.01 sec)

As you'll notice, the table has a column for Wiki -- which means you can
use it to do cross-wiki analysis.

mmonth and reverted were added by Leila, so she'll need to comment on that.


Otherwise:

   - wiki - wikidb name (e.g. "enwiki")
   - month - YYYYMM
   - user_id - corresponds to user table
   - page_namespace - namespace ID number
   - archived - # of revisions to deleted pages
   - revisions - # of all revisions (archived or not)

-Aaron

On Thu, Jan 8, 2015 at 9:00 AM, Dan Andreescu <[email protected]>
wrote:

>
>
> On Thu, Jan 8, 2015 at 2:33 AM, Oliver Keyes <[email protected]> wrote:
>
>> On 8 January 2015 at 02:31, Gergo Tisza <[email protected]> wrote:
>> > On Wed, Jan 7, 2015 at 6:26 PM, Oliver Keyes <[email protected]>
>> wrote:
>> >>
>> >> places to get edits? Well....the revision table? I'm sort of confused
>> >> as to what you're looking for, I guess, that the db wouldn't have.
>> >
>> >
>> > There are a thousand or so wikis; it would be nice if there was a single
>> > table with all the edits. I guess I can generate a query with a thousand
>> > unions...
>
>
> We agree.  And that's why we're building a data warehouse.  We are
> currently going back and forth with Sean vetting a load process that
> creates exactly the "edit" table as you describe it.  The nice thing about
> the schema we are putting together is that not only would you be able to
> see the namespace of the page at the time of query but also throughout the
> page's history (as it moves from draft to main, etc.)
>
> >
>> > The harder problem is that it would be nice to group by editor activity
>> > levels. One of the concerns about MediaViewer was that it makes harder
>> for
>> > new editors to understand file pages and start editing them; so it
>> would be
>> > a plausible hypothesis that the number of file edits by new editors
>> would
>> > drop sharply after making MV default, but the total file edit count
>> wouldn't
>> > be visibly affected because it would be dominated by power users who
>> already
>> > know how to curate image metadata.
>> >
>> > So I would like to look at something like the number of first edits per
>> > month, or the number of edits by editors who at the time had less than
>> 10
>> > edits... recovering that kind of data from the revision table seems
>> > extremely difficult.
>>
>> Yeah, that is difficult. Aaron has, I believe, precomputed some things;
>> Aaron?
>>
>
> IANAA (I am not an Aaron) but I'm happy to help with the query.  I know of
> most of the stuff Aaron pre-computed as of a couple of months ago and this
> specific thing wasn't done.  Gergo, if you could precisely spell out a few
> queries you'd like to do, I can translate to SQL and use the experience to
> inform our data warehouse work.
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to