[Dbpedia-developers] Processing Wikipedia Categories

kasun perera Wed, 26 Jun 2013 20:25:09 -0700

As discussed with Marco these are the next tasks that i would be working.

1. Identification of leaf categories
2. Prominent leaves discovery
3. Pages clustering based on prominent leaves


For above task 1, I'm planing to use Wikipedia category and category_links
SQL tables available here. http://dumps.wikimedia.org/enwiki/20130604/

above dump files are somewhat larger 20mb and 1.2gb in size respectively.
I'm thinking of putting these data in to a MySql database and do the
processing rather than process these files in-memory. Also the amount of
leaf categories and prominent nodes would be large and need to be push to a
MySql tables.

I want to know whether this code should be write under extraction-framwork
code,if so where should I plug this code?
or whether is it good idea to write it separately, and push to a new repo?
If I write it separately can I use a language other than Scala?


-- 
Regards

Kasun Perera

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

[Dbpedia-developers] Processing Wikipedia Categories

Reply via email to