http://www.mediawiki.org/wiki/Special:Code/MediaWiki/88451

Revision: 88451
Author:   diederik
Date:     2011-05-20 15:18:07 +0000 (Fri, 20 May 2011)
Log Message:
-----------
Create datacompetition datasets

Added Paths:
-----------
    trunk/tools/editor_trends/kaggle/
    trunk/tools/editor_trends/kaggle/training.py

Added: trunk/tools/editor_trends/kaggle/training.py
===================================================================
--- trunk/tools/editor_trends/kaggle/training.py                                
(rev 0)
+++ trunk/tools/editor_trends/kaggle/training.py        2011-05-20 15:18:07 UTC 
(rev 88451)
@@ -0,0 +1,25 @@
+import codecs
+import os
+
+
+
+location = '/home/diederik/wikimedia/wikilytics/en/wiki/txt'
+files = os.listdir(location)
+
+output = codecs.open('training.txt', 'w', 'utf-8')
+
+for filename in files:
+    fh = codecs.open(os.path.join(location, filename))
+    for line in fh:
+        line = line.strip()
+        line = line.split('\t')
+        if len(line) != 13:
+            continue
+        username = line[12].lower()
+        if username.endswith('bot'):
+            line[5] = 1
+        line = '\t'.join(line)
+        output.write(line)
+    
+    
+output.close()
\ No newline at end of file


Property changes on: trunk/tools/editor_trends/kaggle/training.py
___________________________________________________________________
Added: svn:eol-style
   + native


_______________________________________________
MediaWiki-CVS mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-cvs

Reply via email to