It turns out that I did do some pre-computing here. See db1047.eqiad.wmnet:staging.editor_month_by_namespace
[staging]> explain editor_month_by_namespace; +----------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------------+--------------+------+-----+---------+-------+ | wiki | varchar(50) | NO | PRI | | | | month | varbinary(7) | NO | PRI | | | | user_id | varchar(255) | NO | PRI | | | | page_namespace | int(11) | NO | PRI | 0 | | | archived | int(11) | YES | | NULL | | | revisions | int(11) | YES | | NULL | | | mmonth | date | YES | | NULL | | | reverted | int(11) | YES | | NULL | | +----------------+--------------+------+-----+---------+-------+ 8 rows in set (0.01 sec) As you'll notice, the table has a column for Wiki -- which means you can use it to do cross-wiki analysis. mmonth and reverted were added by Leila, so she'll need to comment on that. Otherwise: - wiki - wikidb name (e.g. "enwiki") - month - YYYYMM - user_id - corresponds to user table - page_namespace - namespace ID number - archived - # of revisions to deleted pages - revisions - # of all revisions (archived or not) -Aaron On Thu, Jan 8, 2015 at 9:00 AM, Dan Andreescu <[email protected]> wrote: > > > On Thu, Jan 8, 2015 at 2:33 AM, Oliver Keyes <[email protected]> wrote: > >> On 8 January 2015 at 02:31, Gergo Tisza <[email protected]> wrote: >> > On Wed, Jan 7, 2015 at 6:26 PM, Oliver Keyes <[email protected]> >> wrote: >> >> >> >> places to get edits? Well....the revision table? I'm sort of confused >> >> as to what you're looking for, I guess, that the db wouldn't have. >> > >> > >> > There are a thousand or so wikis; it would be nice if there was a single >> > table with all the edits. I guess I can generate a query with a thousand >> > unions... > > > We agree. And that's why we're building a data warehouse. We are > currently going back and forth with Sean vetting a load process that > creates exactly the "edit" table as you describe it. The nice thing about > the schema we are putting together is that not only would you be able to > see the namespace of the page at the time of query but also throughout the > page's history (as it moves from draft to main, etc.) > > > >> > The harder problem is that it would be nice to group by editor activity >> > levels. One of the concerns about MediaViewer was that it makes harder >> for >> > new editors to understand file pages and start editing them; so it >> would be >> > a plausible hypothesis that the number of file edits by new editors >> would >> > drop sharply after making MV default, but the total file edit count >> wouldn't >> > be visibly affected because it would be dominated by power users who >> already >> > know how to curate image metadata. >> > >> > So I would like to look at something like the number of first edits per >> > month, or the number of edits by editors who at the time had less than >> 10 >> > edits... recovering that kind of data from the revision table seems >> > extremely difficult. >> >> Yeah, that is difficult. Aaron has, I believe, precomputed some things; >> Aaron? >> > > IANAA (I am not an Aaron) but I'm happy to help with the query. I know of > most of the stuff Aaron pre-computed as of a couple of months ago and this > specific thing wasn't done. Gergo, if you could precisely spell out a few > queries you'd like to do, I can translate to SQL and use the experience to > inform our data warehouse work. > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
