Hi Andrea, Yo are checking the wrong datasets. We are using [1] and [2]. Cheers,
[1] http://downloads.dbpedia.org/preview.php?file=3.8_sl_en_sl_article_categories_en.ttl.bz2 [2] http://downloads.dbpedia.org/preview.php?file=3.8_sl_en_sl_skos_categories_en.ttl.bz2 On 7/5/13 3:17 PM, Andrea Di Menna wrote: > On top of this: > > $ bzgrep "1612_establishments_in_Mexico" skos_categories_en.nt.bz2 > <http://dbpedia.org/resource/Category:1612_establishments_in_Mexico> > <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> > <http://www.w3.org/2004/02/skos/core#Concept> . > > So the category you are referring to in [1] is present in skos_categories. > It does not have a parent category in this case, but I think this is > because parent categories are added using a template which might be > unsupported in the extraction framework: > {{estcatCountry|161|2|Mexico}} > > Am I wrong? > > [1] https://github.com/dbpedia/dbpedia-links/issues/16 > > > 2013/7/5 Andrea Di Menna <[email protected] <mailto:[email protected]>> > > Hi Marco, Kasun, > > I am not sure I am understanding now. > > I have not verified directly yet your statement about missing > categories in the skos data, but after a rough check > > $ zcat skos_categories_en.nt.gz | grep -v "^#" | cut -d">" -f1 | > sort | uniq | wc -l > 862826 > $ zcat category_labels_en.nt.gz | grep -v "^#" | sort | uniq | wc -l > 862826 > > That suggests all the categories compare in the skos_categories (I > presume as skos:Concept). > > What confuses me about your statement is that: > "categories that don't have a broader category are not included in > dump 2" (i.e. skos_categories file) but then you say that > http://en.wikipedia.org/wiki/Category:1612_establishments_in_Mexico which > has a parent > http://en.wikipedia.org/wiki/Category:1610s_establishments_in_Mexico does > not appear in skos_categories. > Does that mean also categories which have broader categories are not > included in skos_categories? > Could you please elaborate? > > Kasun, for what regards data freshness (Reason 2) you can recreate a > new version of DBpedia dataset for your personal use whenever you > prefer (using the extraction framework). > > Cheers > Andrea > > > 2013/7/5 Marco Fossati <[email protected] > <mailto:[email protected]>> > > Hi Kasun, > > On 7/5/13 2:17 PM, kasun perera wrote: > > I was working with skos_categories but these are some reasons > that I > > would avoid using that dataset for parent-child relationship > detection. > > > > Reason 1 > > I need to get all leaf categories AND thier child- parent > relationships. > > Categories that don't have a broader category are not included in > > skos_category dump. This claim is discussed here > > https://github.com/dbpedia/dbpedia-links/issues/16 > This is a finding that should be documented if not already done. > Could you please check and eventually update the documentation? > Also, if this is not the intended behavior of the category > extractor, it > should be reported as an issue in the extractor-framework repo. > > Thanks Kasun for these useful analytics. > Cheers! > > > > Reason 2 > > We need to concern about data freshness dealing with tasks > related to > > knowledge representation. Debpedia latest dumps (1.8) are > nealy one year > > older. This work also need to deal with other datasets such > as Wikipedia > > page_edit_history, interlaguage links ect. So there is the > need that all > > the datasets are in sync with each other, i.e. they have the > same > > dates. If I use dbpedia dumps there is a problem of finding > synchronized > > datasets. > > > > > > > > On Fri, Jun 28, 2013 at 3:33 PM, Alessio Palmero Aprosio > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > Dear Kasun, > > I'm investigating graph DBs in this period, but I haven't > tried any yet. > > In my implementation, I'm using a Lucene index to store > categories. > > I have two fields: category name and parent. The parent > is null if > > there is no parent at all. > > Whenever I need a path, I start from the category and go for > > parents. If I encounter a category I already encountered > before, I > > stop the loop (otherwise it will go on forever). > > > > You also can use a simple MySQL database with two fields, > but I > > think Lucene is faster. > > > > Alessio > > > > > > Il 28/06/13 10:25, kasun perera ha scritto: > >> Hi Alessio > >> > >> On Thu, Jun 27, 2013 at 1:42 PM, Alessio Palmero Aprosio > >> <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> wrote: > >> > >> Dear Kasun, > >> I had to deal with the same problem some months ago, > >> > >> > >> Just curious about how did you stored the edges and vertices > >> relationships when processing the categories. > >> In-memory processing would be difficult since it has a > huge number > >> of edges and vertices, so I think it's good to store > them in a > >> database. > >> I have heard about graph databases[1], but haven't > worked with > >> them. Did you use something like that or simple mysql > database? > >> > >> [1]http://en.wikipedia.org/wiki/Graph_database > >> > >> and I managed to use the XML article file: you can > intercept > >> categories using the "Category:" prefix, and you can > infer > >> father-son relation using the <title> tag (if the > <title> > >> starts with "Category:", all the categories for this > page are > >> possible ancestors). > >> The Wikipedia category taxonomy is quite a mess, so > good luck! > >> > >> Alessio > >> > >> > >> Il 27/06/13 05:24, kasun perera ha scritto: > >>> As discussed with Marco these are the next tasks > that i would > >>> be working. > >>> > >>> 1. Identification of leaf categories > >>> 2. Prominent leaves discovery > >>> 3. Pages clustering based on prominent leaves > >>> > >>> For above task 1, I'm planing to use Wikipedia > category and > >>> category_links SQL tables available here. > >>> http://dumps.wikimedia.org/enwiki/20130604/ > >>> > >>> above dump files are somewhat larger 20mb and 1.2gb > in size > >>> respectively. > >>> I'm thinking of putting these data in to a MySql > database and > >>> do the processing rather than process these files > in-memory. > >>> Also the amount of leaf categories and prominent > nodes would > >>> be large and need to be push to a MySql tables. > >>> > >>> I want to know whether this code should be write under > >>> extraction-framwork code,if so where should I plug > this code? > >>> or whether is it good idea to write it separately, > and push > >>> to a new repo? If I write it separately can I use a > language > >>> other than Scala? > >>> > >>> > >>> -- > >>> Regards > >>> > >>> Kasun Perera > >>> > >>> > >>> > >>> > > ------------------------------------------------------------------------------ > >>> This SF.net email is sponsored by Windows: > >>> > >>> Build for Windows Store. > >>> > >>> http://p.sf.net/sfu/windows-dev2dev > >>> > >>> > >>> _______________________________________________ > >>> Dbpedia-developers mailing list > >>> [email protected] > <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>> > >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers > >> > >> > >> > >> > >> -- > >> Regards > >> > >> Kasun Perera > >> > > > > > > > > > > -- > > Regards > > > > Kasun Perera > > > > -- > Marco Fossati > http://about.me/marco.fossati > Twitter: @hjfocs > Skype: hell_j > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Dbpedia-developers mailing list > [email protected] > <mailto:[email protected]> > https://lists.sourceforge.net/lists/listinfo/dbpedia-developers > > > -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Dbpedia-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
