Re: [Dbpedia-developers] Processing Wikipedia Categories

Andrea Di Menna Thu, 27 Jun 2013 08:03:38 -0700

Hi Kasun,

in this case dbp:A skos:broader dbp:B means that A is a sub-category of B.
You can use these relations to build a directed graph (e.g. B -> A) and
then find leaves.


Is it a bit clearer now?

Cheers
Andrea


2013/6/27 kasun perera <[email protected]>

> Hi Andrea
>
> The triples are in following format in the file [3]
>
> <http://dbpedia.org/resource/Category:World_War_II>_<http://www.w3.org/2004/02/skos/core#broader>_<http://dbpedia.org/resource/Category:Conflicts_in_1940>
>
> I'm not sure how can I use these triples to identify leaf categories, can you 
> please explain bit more about it?
>
> Thanks
>
>
>
> On Thu, Jun 27, 2013 at 1:46 PM, Andrea Di Menna <[email protected]>wrote:
>
>> Hi Kasun,
>>
>> why don't you use the hierarchy specified in [1]?
>>
>>  Wikipedia categories are already organized using skos:broader property
>> [2] (see [3]).
>> I think that should be enough to build a graph or categories links.
>>
>> Moreover, if I remember correctly, the category system is not a DAG,
>> since there are some cycles (at least in the wikipedia version from which
>> DBpedia 3.8 was extracted from).
>>
>> Regards
>> Andrea
>>
>> [1] http://wiki.dbpedia.org/Downloads38#categories-skos
>> [2] http://www.w3.org/2009/08/skos-reference/skos.html#broader
>> [3]
>> http://downloads.dbpedia.org/preview.php?file=3.8_sl_en_sl_skos_categories_en.nt.bz2
>>
>>
>> 2013/6/27 kasun perera <[email protected]>
>>
>>>  As discussed with Marco these are the next tasks that i would be
>>> working.
>>>
>>> 1. Identification of leaf categories
>>> 2. Prominent leaves discovery
>>> 3. Pages clustering based on prominent leaves
>>>
>>> For above task 1, I'm planing to use Wikipedia category and
>>> category_links SQL tables available here.
>>> http://dumps.wikimedia.org/enwiki/20130604/
>>>
>>> above dump files are somewhat larger 20mb and 1.2gb in size respectively.
>>> I'm thinking of putting these data in to a MySql database and do the
>>> processing rather than process these files in-memory. Also the amount of
>>> leaf categories and prominent nodes would be large and need to be push to a
>>> MySql tables.
>>>
>>> I want to know whether this code should be write under
>>> extraction-framwork code,if so where should I plug this code?
>>> or whether is it good idea to write it separately, and push to a new
>>> repo? If I write it separately can I use a language other than Scala?
>>>
>>>
>>> --
>>> Regards
>>>
>>> Kasun Perera
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Windows:
>>>
>>> Build for Windows Store.
>>>
>>> http://p.sf.net/sfu/windows-dev2dev
>>> _______________________________________________
>>> Dbpedia-developers mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>>
>>>
>>
>
>
> --
> Regards
>
> Kasun Perera
>
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers

Re: [Dbpedia-developers] Processing Wikipedia Categories

Reply via email to