Hello.

Since just a few hours ago, a new public repository has been created to host 
WikiXRay database dumps, containing info extracted from public Wikipedia 
dbdumps. The image is hosted by RedIRIS (in short, the Spanish equivalent of 
Kennisnet in Netherlands).

http://sunsite.rediris.es/mirror/WKP_research

ftp://ftp.rediris.es/mirror/WKP_research

These new dumps are aimed to save time and effort to other researchers, since 
they won't need to parse the complete XML dumps to extract all relevant 
activity metadata. We used mysqldump to create the dumps from our databases.. 

As of today, only some of the biggest Wikipedias are available. However,  in 
the following days the full set of available languages will be ready for 
downloading. The files will be updated regularly.

The procedure is as follows:

1. Find the research dump of your interest. Download and decompress it in your 
local system.

2. Create a local DB to import the information.

3. Load the dump file, using a MySQL user with insert privileges:

$> mysql -u user -p passw myDB < dumpfile.sql

And you're done.

Final warning. 3 fields in the revision table are not reliable yet:

rev_num_inlinks
rev_num_outlinks
rev_num_trans

All remaining fields/values are trustable (in particular rev_len, 
rev_num_words, and so forth).

Regards,

Felipe.






      


_______________________________________________
foundation-l mailing list
foundation-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Reply via email to