I thought to the same implementation than you Jona but a little bit
different. Here my steps :
1) Parse the XML file and retrieve all the data about these templates. For
example we see a tag "title" with this :
Modèle:Données/Toulouse/évolution_population
2) Extract the last "an" and "pop" values
3) Put in a file the triples :
<http://fr.dbpedia.org/resource/Toulouse> <
http://fr.dbpedia.org/property/population> number pop^^xsd:integer .
<http://fr.dbpedia.org/resource/Toulouse> <
http://fr.dbpedia.org/property/AnneePopulation> year^^xsd:date .
And so on, for all these templates. What do you think ?
I know it's not really generic but it's a good beginning to think after to
a generic solution.
Best.
Julien.
2013/4/21 Jona Christopher Sahnwaldt <[email protected]>
> Good question. Short answer: No, DBpedia can't handle these templates,
> and it's hard to change that.
>
> It would be nice to do it in a generic way: design a system that
> allows users of the mappings wiki to add rules how such templates
> should be handled in a certain lanuage. Write Scala code that executes
> these rules and parses the template definitions (e.g.
> Modèle:Données/Toulouse/évolution_population) to extract the data and
> store it in memory or in an temporary file. Then during the main
> extraction, when you find a template call like {{Dernière population
> commune de France}}, get the data from storage and generate the
> appropriate triples.
>
> A major effort. Related to
> http://wiki.dbpedia.org/gsoc2013/ideas/CrowdsourceTestsAndRules , but
> even bigger.
>
> Maybe it would be easier to extend DBpedia such that the framework can
> "execute" template definitions.
>
> Maybe all that is a waste of time because the data will soon move to
> Wikidata. We just don't know how soon: Three months? Three years?
> Never?
>
> JC
>
> On 21 April 2013 22:04, Julien Plu <[email protected]>
> wrote:
> > Thanks Jona for these precisions :-)
> >
> > Another thing, I would like to know if the extraction framework can use
> the
> > "data templates". I mean some properties values (in french wikipedia for
> > french Settlement) are now replaced by templates, for example :
> >
> > population = {{Dernière population commune de France}} <!-- {{Last
> > population french Settlement}} -->
> >
> > And this data is contained in this kind of pattern :
> >
> > http://fr.wikipedia.fr/wiki/Modèle:Données/Nom de
> > l'article/évolution_population
> >
> > In english :
> >
> > Template:Data/article name/evolution_population
> >
> > By example :
> >
> http://fr.wikipedia.org/wiki/Modèle:Données/Toulouse/évolution_population
> >
> > It's always the same address pattern. And these templates look like this
> :
> >
> > <includeonly>{{#switch: {{{1|}}}
> > |an1=1793|pop1=52612
> > |anX=year|popX=number
> > |an=last_year|pop=last_known_number}}</includeonly>
> >
> > These templates are in the XML dump.
> >
> > So it has been added in the extraction framework ? if no, what files I
> have
> > to modify for including these kind of exceptions ?
> >
> > Best.
> >
> > Julien.
> >
> >
> > 2013/4/21 Jona Christopher Sahnwaldt <[email protected]>
> >>
> >> On 21 April 2013 19:38, Julien Plu <[email protected]
> >
> >> wrote:
> >> > Hi,
> >> >
> >> > An idea of what I do wrongly? (see my previous mail below)
> >> >
> >> > Best.
> >> >
> >> > Julien.
> >> >
> >> > From: Julien Plu <[email protected]>
> >> > Date: 2013/4/20
> >> > Subject: Problem with extracted data
> >> > To: "[email protected]"
> >> > <[email protected]>
> >> >
> >> >
> >> > Hi,
> >> >
> >> > After to have imported the extracted data into my virtuoso server I
> >> > could
> >> > see that I had some strange data. By example all my URI start with
> >> > "http://dbpedia.org" and not with "http://fr.dbpedia.org" and I don't
> >> > have
> >> > the "prop-fr" properties too, whereas I put "fr" in all the extraction
> >> > properties file.
> >> >
> >> > I could see too, if I compare the data from the http://fr.dbpedia.org
> >> > and
> >> > mine they are not the same. By example if you compare these two sparql
> >> > results :
> >> >
> >> > mine :
> >> >
> >> >
> http://data.lirmm.fr:8890/sparql?default-graph-uri=&query=select+distinct+*+where+%7B%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FToulouse%3E+%3Fp+%3Fo%7D&should-sponge=&format=text%2Fhtml&timeout=0&debug=on
> >> >
> >> > fr.dbpedia.org :
> >> >
> >> >
> http://fr.dbpedia.org/sparql?default-graph-uri=&query=select+distinct+*+where+%7B%3Chttp%3A%2F%2Ffr.dbpedia.org%2Fresource%2FToulouse%3E+%3Fp+%3Fo%7D&format=text%2Fhtml&timeout=0&debug=on
> >> >
> >> > In mine, I don't have the "http://www.w3.org/2002/07/owl#sameAs" or
> >>
> >> Do you mean the triples like http://www.w3.org/2002/07/owl#sameAs
> >> http://de.dbpedia.org/resource/Toulouse ? To get them, you would have
> >> to download Wikipedia dumps for several other languages, run
> >> InterlangueLinkExtractor on them, and then run
> >>
> >>
> https://github.com/dbpedia/extraction-framework/blob/master/scripts/src/main/scala/org/dbpedia/extraction/scripts/ProcessInterLanguageLinks.scala
> >> on all the result files.
> >>
> >> Or you could use the links in
> >>
> >>
> http://downloads.dbpedia.org/3.8/fr/interlanguage_links_same_as_chapters_fr.ttl.bz2
> >> or a similar file.
> >>
> >> > "http://fr.dbpedia.org/property/population" properties among many
> >> > others.
> >> >
> >> > In attachment my extraction property file.
> >> >
> >> > What I did wrong ?
> >> >
> >> > Best.
> >> >
> >> > Julien.
> >> >
> >> >
> >> >
> >> >
> ------------------------------------------------------------------------------
> >> > Precog is a next-generation analytics platform capable of advanced
> >> > analytics on semi-structured data. The platform includes APIs for
> >> > building
> >> > apps and a phenomenal toolset for data science. Developers can use
> >> > our toolset for easy data analysis & visualization. Get a free
> account!
> >> > http://www2.precog.com/precogplatform/slashdotnewsletter
> >> > _______________________________________________
> >> > Dbpedia-discussion mailing list
> >> > [email protected]
> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
> >> >
> >
> >
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion