http://www.mediawiki.org/wiki/Special:Code/MediaWiki/95685

Revision: 95685
Author:   giovanni
Date:     2011-08-29 18:57:11 +0000 (Mon, 29 Aug 2011)
Log Message:
-----------
added documentation to editor_lifecycle

Modified Paths:
--------------
    trunk/tools/wsor/editor_lifecycle/README.rst

Added Paths:
-----------
    trunk/tools/wsor/editor_lifecycle/TODO.rst

Modified: trunk/tools/wsor/editor_lifecycle/README.rst
===================================================================
--- trunk/tools/wsor/editor_lifecycle/README.rst        2011-08-29 18:55:22 UTC 
(rev 95684)
+++ trunk/tools/wsor/editor_lifecycle/README.rst        2011-08-29 18:57:11 UTC 
(rev 95685)
@@ -1,7 +1,11 @@
-============
-README 
-============
+Editor lifecycle
+================
 
+Author: Giovanni Luca Ciampaglia
+
+License
+-------
+
 Copyright (C) 2011 GIOVANNI LUCA CIAMPAGLIA, [email protected]
 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
@@ -18,33 +22,54 @@
 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 http://www.gnu.org/copyleft/gpl.html
 
----------
-workflow
----------
+Installation
+------------
 
-This package is a collection of python and shell scripts that can assist
-creating and analyzing data on user life cycle. 
+To install this package you can use the normal distutils command::
 
-Sample selection
-----------------
+    python setup.py install    
 
-TBD
+see http://docs.python.org/install/index.html#install-index for more options.
+You might require root access (sudo) to perform a system-wide installation.
 
-Edit activity data collection
------------------------------
+Usage
+-----
+See http://http://meta.wikimedia.org/wiki/Research:Editor_lifecycle. All 
scripts
+accept arguments from the command line and understand the common -h/--help
+option.
 
-First use `fetchrates` to download the rate data from the MySQL database. This
-script takes a user_id in input (and stores the rate data in a file called
-<user_id>.npy). This script can be parallelized. At the end you will end up 
with
-a bunch of NPY files.
+Workflow
+--------
 
-Cohort selection
-----------------
+1. Fetch user rates using `ratesnobots.sql`::
 
-See the docstring in `mkcohort`.
+   mysql -BNe < ratesnobots.sql > rates.tsv
 
-Cohort analysis
----------------
+Note: To be able to run this query, you must be able to access an internal
+resource of the Wikimedia Foundation, see here for more information:
+http://collab.wikimedia.org/wiki/WSoR_datasets/bot. If you can't access this
+page, you can recreate this information from a public dump of the
+`user_groups` and `user` tables in the following way:
 
-See `graphlife`, `fitting`, `fitting_batch.sh`, and `relax`.
+   a. Gather usernames from bot status
+   (http://en.wikipedia.org/w/index.php?title=Wikipedia:Bots/Status) and list 
of
+   bots by number of edits
+   (http://en.wikipedia.org/wiki/Wikipedia:List_of_bots_by_number_of_edits)
 
+   b. Select the user IDs of the gathered user names from `user` 
+
+   c. Do a union the above data with user_groups::
+
+   SELECT DISTINCT ug_user FROM user_groups where ug_group = "bot"
+
+2. Use `mkcohort` to make cohorts. This will create a file where each line is a
+   cohort, specified by the first two columns. Columns after the second are the
+   IDs of users.
+
+3. Use `fetchrates` to fetch daily edit counts using the cohort data. See
+   `sge/rates.sh` if you want to run this query from within the toolserver. 
+
+4. At this point you can use the other utilities to analyze the rate data. To
+   compute and plot activity peaks, use `comppeak` and `plotpeak`.
+
+5. Happy hacking/researching!

Added: trunk/tools/wsor/editor_lifecycle/TODO.rst
===================================================================
--- trunk/tools/wsor/editor_lifecycle/TODO.rst                          (rev 0)
+++ trunk/tools/wsor/editor_lifecycle/TODO.rst  2011-08-29 18:57:11 UTC (rev 
95685)
@@ -0,0 +1,2 @@
+* Use `oursql.Cursor.executemany` in `fetchrates`. Presently this is not 
possible,
+  because of a bug in `oursql`. See 
https://answers.launchpad.net/oursql/+question/166877


_______________________________________________
MediaWiki-CVS mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-cvs

Reply via email to