Hi Alessio

On Thu, Jun 27, 2013 at 1:42 PM, Alessio Palmero Aprosio <[email protected]>wrote:

>  Dear Kasun,
> I had to deal with the same problem some months ago,
>

Just curious about how did you stored the edges and vertices relationships
when processing the categories.
In-memory processing would be difficult since it has a huge number of edges
and vertices, so I think it's good to store them in a database.
I have heard about graph databases[1], but haven't worked with them. Did
you use something like that or simple mysql database?

[1]http://en.wikipedia.org/wiki/Graph_database


> and I managed to use the XML article file: you can intercept categories
> using the "Category:" prefix, and you can infer father-son relation using
> the <title> tag (if the <title> starts with "Category:", all the categories
> for this page are possible ancestors).
> The Wikipedia category taxonomy is quite a mess, so good luck!
>
> Alessio
>
>
> Il 27/06/13 05:24, kasun perera ha scritto:
>
>  As discussed with Marco these are the next tasks that i would be working.
>
>  1. Identification of leaf categories
> 2. Prominent leaves discovery
> 3. Pages clustering based on prominent leaves
>
>  For above task 1, I'm planing to use Wikipedia category and
> category_links SQL tables available here.
> http://dumps.wikimedia.org/enwiki/20130604/
>
>  above dump files are somewhat larger 20mb and 1.2gb in size respectively.
> I'm thinking of putting these data in to a MySql database and do the
> processing rather than process these files in-memory. Also the amount of
> leaf categories and prominent nodes would be large and need to be push to a
> MySql tables.
>
>  I want to know whether this code should be write under
> extraction-framwork code,if so where should I plug this code?
> or whether is it good idea to write it separately, and push to a new repo?
> If I write it separately can I use a language other than Scala?
>
>
>  --
> Regards
>
> Kasun Perera
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
> http://p.sf.net/sfu/windows-dev2dev
>
>
>
> _______________________________________________
> Dbpedia-developers mailing 
> [email protected]https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>
>


-- 
Regards

Kasun Perera
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to