http://www.mediawiki.org/wiki/Special:Code/MediaWiki/95685
Revision: 95685
Author: giovanni
Date: 2011-08-29 18:57:11 +0000 (Mon, 29 Aug 2011)
Log Message:
-----------
added documentation to editor_lifecycle
Modified Paths:
--------------
trunk/tools/wsor/editor_lifecycle/README.rst
Added Paths:
-----------
trunk/tools/wsor/editor_lifecycle/TODO.rst
Modified: trunk/tools/wsor/editor_lifecycle/README.rst
===================================================================
--- trunk/tools/wsor/editor_lifecycle/README.rst 2011-08-29 18:55:22 UTC
(rev 95684)
+++ trunk/tools/wsor/editor_lifecycle/README.rst 2011-08-29 18:57:11 UTC
(rev 95685)
@@ -1,7 +1,11 @@
-============
-README
-============
+Editor lifecycle
+================
+Author: Giovanni Luca Ciampaglia
+
+License
+-------
+
Copyright (C) 2011 GIOVANNI LUCA CIAMPAGLIA, [email protected]
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -18,33 +22,54 @@
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
http://www.gnu.org/copyleft/gpl.html
----------
-workflow
----------
+Installation
+------------
-This package is a collection of python and shell scripts that can assist
-creating and analyzing data on user life cycle.
+To install this package you can use the normal distutils command::
-Sample selection
-----------------
+ python setup.py install
-TBD
+see http://docs.python.org/install/index.html#install-index for more options.
+You might require root access (sudo) to perform a system-wide installation.
-Edit activity data collection
------------------------------
+Usage
+-----
+See http://http://meta.wikimedia.org/wiki/Research:Editor_lifecycle. All
scripts
+accept arguments from the command line and understand the common -h/--help
+option.
-First use `fetchrates` to download the rate data from the MySQL database. This
-script takes a user_id in input (and stores the rate data in a file called
-<user_id>.npy). This script can be parallelized. At the end you will end up
with
-a bunch of NPY files.
+Workflow
+--------
-Cohort selection
-----------------
+1. Fetch user rates using `ratesnobots.sql`::
-See the docstring in `mkcohort`.
+ mysql -BNe < ratesnobots.sql > rates.tsv
-Cohort analysis
----------------
+Note: To be able to run this query, you must be able to access an internal
+resource of the Wikimedia Foundation, see here for more information:
+http://collab.wikimedia.org/wiki/WSoR_datasets/bot. If you can't access this
+page, you can recreate this information from a public dump of the
+`user_groups` and `user` tables in the following way:
-See `graphlife`, `fitting`, `fitting_batch.sh`, and `relax`.
+ a. Gather usernames from bot status
+ (http://en.wikipedia.org/w/index.php?title=Wikipedia:Bots/Status) and list
of
+ bots by number of edits
+ (http://en.wikipedia.org/wiki/Wikipedia:List_of_bots_by_number_of_edits)
+ b. Select the user IDs of the gathered user names from `user`
+
+ c. Do a union the above data with user_groups::
+
+ SELECT DISTINCT ug_user FROM user_groups where ug_group = "bot"
+
+2. Use `mkcohort` to make cohorts. This will create a file where each line is a
+ cohort, specified by the first two columns. Columns after the second are the
+ IDs of users.
+
+3. Use `fetchrates` to fetch daily edit counts using the cohort data. See
+ `sge/rates.sh` if you want to run this query from within the toolserver.
+
+4. At this point you can use the other utilities to analyze the rate data. To
+ compute and plot activity peaks, use `comppeak` and `plotpeak`.
+
+5. Happy hacking/researching!
Added: trunk/tools/wsor/editor_lifecycle/TODO.rst
===================================================================
--- trunk/tools/wsor/editor_lifecycle/TODO.rst (rev 0)
+++ trunk/tools/wsor/editor_lifecycle/TODO.rst 2011-08-29 18:57:11 UTC (rev
95685)
@@ -0,0 +1,2 @@
+* Use `oursql.Cursor.executemany` in `fetchrates`. Presently this is not
possible,
+ because of a bug in `oursql`. See
https://answers.launchpad.net/oursql/+question/166877
_______________________________________________
MediaWiki-CVS mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-cvs