On top of this:
$ bzgrep "1612_establishments_in_Mexico" skos_categories_en.nt.bz2
<http://dbpedia.org/resource/Category:1612_establishments_in_Mexico> <
http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
http://www.w3.org/2004/02/skos/core#Concept> .
So the category you are referring to in [1] is present in skos_categories.
It does not have a parent category in this case, but I think this is
because parent categories are added using a template which might be
unsupported in the extraction framework:
{{estcatCountry|161|2|Mexico}}
Am I wrong?
[1] https://github.com/dbpedia/dbpedia-links/issues/16
2013/7/5 Andrea Di Menna <[email protected]>
> Hi Marco, Kasun,
>
> I am not sure I am understanding now.
>
> I have not verified directly yet your statement about missing categories
> in the skos data, but after a rough check
>
> $ zcat skos_categories_en.nt.gz | grep -v "^#" | cut -d">" -f1 | sort |
> uniq | wc -l
> 862826
> $ zcat category_labels_en.nt.gz | grep -v "^#" | sort | uniq | wc -l
> 862826
>
> That suggests all the categories compare in the skos_categories (I presume
> as skos:Concept).
>
> What confuses me about your statement is that:
> "categories that don't have a broader category are not included in dump
> 2" (i.e. skos_categories file) but then you say that
> http://en.wikipedia.org/wiki/Category:1612_establishments_in_Mexico which
> has a parent
> http://en.wikipedia.org/wiki/Category:1610s_establishments_in_Mexico does
> not appear in skos_categories.
> Does that mean also categories which have broader categories are not
> included in skos_categories?
> Could you please elaborate?
>
> Kasun, for what regards data freshness (Reason 2) you can recreate a new
> version of DBpedia dataset for your personal use whenever you prefer (using
> the extraction framework).
>
> Cheers
> Andrea
>
>
> 2013/7/5 Marco Fossati <[email protected]>
>
>> Hi Kasun,
>>
>> On 7/5/13 2:17 PM, kasun perera wrote:
>> > I was working with skos_categories but these are some reasons that I
>> > would avoid using that dataset for parent-child relationship detection.
>> >
>> > Reason 1
>> > I need to get all leaf categories AND thier child- parent relationships.
>> > Categories that don't have a broader category are not included in
>> > skos_category dump. This claim is discussed here
>> > https://github.com/dbpedia/dbpedia-links/issues/16
>> This is a finding that should be documented if not already done.
>> Could you please check and eventually update the documentation?
>> Also, if this is not the intended behavior of the category extractor, it
>> should be reported as an issue in the extractor-framework repo.
>>
>> Thanks Kasun for these useful analytics.
>> Cheers!
>> >
>> > Reason 2
>> > We need to concern about data freshness dealing with tasks related to
>> > knowledge representation. Debpedia latest dumps (1.8) are nealy one year
>> > older. This work also need to deal with other datasets such as Wikipedia
>> > page_edit_history, interlaguage links ect. So there is the need that all
>> > the datasets are in sync with each other, i.e. they have the same
>> > dates. If I use dbpedia dumps there is a problem of finding synchronized
>> > datasets.
>> >
>> >
>> >
>> > On Fri, Jun 28, 2013 at 3:33 PM, Alessio Palmero Aprosio <
>> [email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> > Dear Kasun,
>> > I'm investigating graph DBs in this period, but I haven't tried any
>> yet.
>> > In my implementation, I'm using a Lucene index to store categories.
>> > I have two fields: category name and parent. The parent is null if
>> > there is no parent at all.
>> > Whenever I need a path, I start from the category and go for
>> > parents. If I encounter a category I already encountered before, I
>> > stop the loop (otherwise it will go on forever).
>> >
>> > You also can use a simple MySQL database with two fields, but I
>> > think Lucene is faster.
>> >
>> > Alessio
>> >
>> >
>> > Il 28/06/13 10:25, kasun perera ha scritto:
>> >> Hi Alessio
>> >>
>> >> On Thu, Jun 27, 2013 at 1:42 PM, Alessio Palmero Aprosio
>> >> <[email protected] <mailto:[email protected]>> wrote:
>> >>
>> >> Dear Kasun,
>> >> I had to deal with the same problem some months ago,
>> >>
>> >>
>> >> Just curious about how did you stored the edges and vertices
>> >> relationships when processing the categories.
>> >> In-memory processing would be difficult since it has a huge number
>> >> of edges and vertices, so I think it's good to store them in a
>> >> database.
>> >> I have heard about graph databases[1], but haven't worked with
>> >> them. Did you use something like that or simple mysql database?
>> >>
>> >> [1]http://en.wikipedia.org/wiki/Graph_database
>> >>
>> >> and I managed to use the XML article file: you can intercept
>> >> categories using the "Category:" prefix, and you can infer
>> >> father-son relation using the <title> tag (if the <title>
>> >> starts with "Category:", all the categories for this page are
>> >> possible ancestors).
>> >> The Wikipedia category taxonomy is quite a mess, so good luck!
>> >>
>> >> Alessio
>> >>
>> >>
>> >> Il 27/06/13 05:24, kasun perera ha scritto:
>> >>> As discussed with Marco these are the next tasks that i would
>> >>> be working.
>> >>>
>> >>> 1. Identification of leaf categories
>> >>> 2. Prominent leaves discovery
>> >>> 3. Pages clustering based on prominent leaves
>> >>>
>> >>> For above task 1, I'm planing to use Wikipedia category and
>> >>> category_links SQL tables available here.
>> >>> http://dumps.wikimedia.org/enwiki/20130604/
>> >>>
>> >>> above dump files are somewhat larger 20mb and 1.2gb in size
>> >>> respectively.
>> >>> I'm thinking of putting these data in to a MySql database and
>> >>> do the processing rather than process these files in-memory.
>> >>> Also the amount of leaf categories and prominent nodes would
>> >>> be large and need to be push to a MySql tables.
>> >>>
>> >>> I want to know whether this code should be write under
>> >>> extraction-framwork code,if so where should I plug this code?
>> >>> or whether is it good idea to write it separately, and push
>> >>> to a new repo? If I write it separately can I use a language
>> >>> other than Scala?
>> >>>
>> >>>
>> >>> --
>> >>> Regards
>> >>>
>> >>> Kasun Perera
>> >>>
>> >>>
>> >>>
>> >>>
>> ------------------------------------------------------------------------------
>> >>> This SF.net email is sponsored by Windows:
>> >>>
>> >>> Build for Windows Store.
>> >>>
>> >>> http://p.sf.net/sfu/windows-dev2dev
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Dbpedia-developers mailing list
>> >>> [email protected] <mailto:
>> [email protected]>
>> >>>
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Regards
>> >>
>> >> Kasun Perera
>> >>
>> >
>> >
>> >
>> >
>> > --
>> > Regards
>> >
>> > Kasun Perera
>> >
>>
>> --
>> Marco Fossati
>> http://about.me/marco.fossati
>> Twitter: @hjfocs
>> Skype: hell_j
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Windows:
>>
>> Build for Windows Store.
>>
>> http://p.sf.net/sfu/windows-dev2dev
>> _______________________________________________
>> Dbpedia-developers mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>>
>
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Dbpedia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-developers