http://www.mediawiki.org/wiki/Special:Code/MediaWiki/88451
Revision: 88451
Author: diederik
Date: 2011-05-20 15:18:07 +0000 (Fri, 20 May 2011)
Log Message:
-----------
Create datacompetition datasets
Added Paths:
-----------
trunk/tools/editor_trends/kaggle/
trunk/tools/editor_trends/kaggle/training.py
Added: trunk/tools/editor_trends/kaggle/training.py
===================================================================
--- trunk/tools/editor_trends/kaggle/training.py
(rev 0)
+++ trunk/tools/editor_trends/kaggle/training.py 2011-05-20 15:18:07 UTC
(rev 88451)
@@ -0,0 +1,25 @@
+import codecs
+import os
+
+
+
+location = '/home/diederik/wikimedia/wikilytics/en/wiki/txt'
+files = os.listdir(location)
+
+output = codecs.open('training.txt', 'w', 'utf-8')
+
+for filename in files:
+ fh = codecs.open(os.path.join(location, filename))
+ for line in fh:
+ line = line.strip()
+ line = line.split('\t')
+ if len(line) != 13:
+ continue
+ username = line[12].lower()
+ if username.endswith('bot'):
+ line[5] = 1
+ line = '\t'.join(line)
+ output.write(line)
+
+
+output.close()
\ No newline at end of file
Property changes on: trunk/tools/editor_trends/kaggle/training.py
___________________________________________________________________
Added: svn:eol-style
+ native
_______________________________________________
MediaWiki-CVS mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-cvs