Hi,

I think there is already the right tool [1] in the dbpedia
extraction-framework (maybe I am mistaken here)
Are you aware of any flaws in it that is preventing you from using it?
(maybe I am unaware and I should not be using it nor its output :P)

Regards
Andrea

[1] org.dbpedia.extraction.mappings.SkosCategoriesExtractor


2013/6/27 kasun perera <[email protected]>

>
> On Thu, Jun 27, 2013 at 1:42 PM, Alessio Palmero Aprosio 
> <[email protected]>wrote:
>
>>  Dear Kasun,
>> I had to deal with the same problem some months ago, and I managed to use
>> the XML article file: you can intercept categories using the "Category:"
>> prefix, and you can infer father-son relation using the <title> tag (if the
>> <title> starts with "Category:", all the categories for this page are
>> possible ancestors).
>> The Wikipedia category taxonomy is quite a mess, so good luck!
>>
>> Alessio
>>
>
>
> Hi Alessio
>
> Yes I would try this.Seems this is a good option. I hope this is the
> correct file 
> "enwiki-20130604-pages-articles.xml.bz2<http://dumps.wikimedia.org/enwiki/20130604/enwiki-20130604-pages-articles.xml.bz2>
>  9.2 GB"  that you are refering.
>
> @marco
> Is it good idea to try several options (1-What I have said in previously
> and 2-Aleseio's suggestion 3- any other option) and do some evaluation to
> find out what is best method for getting leaf nodes? May be it would give
> the same output?
>
> Thanks
>
>
>>
>> Il 27/06/13 05:24, kasun perera ha scritto:
>>
>>  As discussed with Marco these are the next tasks that i would be
>> working.
>>
>>  1. Identification of leaf categories
>> 2. Prominent leaves discovery
>> 3. Pages clustering based on prominent leaves
>>
>>  For above task 1, I'm planing to use Wikipedia category and
>> category_links SQL tables available here.
>> http://dumps.wikimedia.org/enwiki/20130604/
>>
>>  above dump files are somewhat larger 20mb and 1.2gb in size
>> respectively.
>> I'm thinking of putting these data in to a MySql database and do the
>> processing rather than process these files in-memory. Also the amount of
>> leaf categories and prominent nodes would be large and need to be push to a
>> MySql tables.
>>
>>  I want to know whether this code should be write under
>> extraction-framwork code,if so where should I plug this code?
>> or whether is it good idea to write it separately, and push to a new
>> repo? If I write it separately can I use a language other than Scala?
>>
>>
>>  --
>> Regards
>>
>> Kasun Perera
>>
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Windows:
>>
>> Build for Windows Store.
>> http://p.sf.net/sfu/windows-dev2dev
>>
>>
>>
>> _______________________________________________
>> Dbpedia-developers mailing 
>> [email protected]https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>
>>
>>
>
>
> --
> Regards
>
> Kasun Perera
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Dbpedia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Reply via email to