Apparently your solution doesn't works because the template "
Données/Toulouse/évolution_population<http://fr.wikipedia.org/wiki/Mod%C3%A8le:Donn%C3%A9es/Toulouse/%C3%A9volution_population>"
doesn't appear in the among the "
dbo:wikiPageUsesTemplate" property values :-(

http://data.lirmm.fr/sparql/?default-graph-uri=&query=select+*+where+{%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FToulouse%3E+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2FwikiPageUsesTemplate%3E+%3Ft}&should-sponge=&format=text%2Fhtml&timeout=0&debug=on

Best.

Julien.


2013/4/22 Julien Plu <[email protected]>

> Hi Julien,
>
>
> >You will store data extracted from the templates pages and then insert
> them when you parse the article page ?
>
> To answer at your question, no, it's not what we have in mind. It's more
> like "HomepageExtractor" by example. Create a gz file with only the
> population inside.
>
> But yes I think your solution can work too, need to test it :-)
>
> Best.
>
> Julien.
>
>
> 2013/4/22 Julien Cojan <[email protected]>
>
>> Hi Julien, Jonas,
>>
>> I just saw your discussion bout externalised templates.
>> For information, the property prop-fr:population appears on
>> http://fr.dbpedia.org because the template
>> Données/Toulouse/évolution_population<http://fr.wikipedia.org/wiki/Mod%C3%A8le:Donn%C3%A9es/Toulouse/%C3%A9volution_population>was
>>  not used when I did the last extraction.
>>
>>
>> About the extractor you want to add, I am not sure I understood how you
>> want to do.
>> You will store data extracted from the templates pages and then insert
>> them when you parse the article page ?
>> So you need to run the extraction framework twice over the Wikipedia
>> dump, the template page may appear after in the dump file.
>>
>> Wouldn't it be more generic to define some insert/delete SPARQL rules to
>> handle this once the extraction process is over ?
>> something like :
>>
>> insert {?s ?p ?v} where {?s dbo:wikiPageUsesTemplate ?t . ?t ?p ?v}
>>
>> then
>>
>> delete {?t ?p ?v} where {?s dbo:wikiPageUsesTemplate ?t . ?t ?p ?v}
>>
>>
>> Cheers,
>> Julien C.
>>
>>
>> ------------------------------
>>
>> *De: *"Julien Plu" <[email protected]>
>> *À: *"Jona Christopher Sahnwaldt" <[email protected]>
>> *Cc: *[email protected]
>> *Envoyé: *Lundi 22 Avril 2013 09:54:59
>> *Objet: *Re: [Dbpedia-discussion] Problem with extracted data
>>
>>
>> Ok, I will try to code this in a new package "fr" this week. I have just
>> to see how to write an extractor and learning Scala :-D
>>
>> Best.
>>
>> Julien.
>>
>>
>> 2013/4/22 Jona Christopher Sahnwaldt <[email protected]>
>>
>>> Good idea! It probably wouldn't be hard to write a specific extractor
>>> for this. Maybe just a few dozen lines.
>>>
>>> Only problem is, we may soon have dozens or hundreds of such
>>> specialized extractors. But we can deal with that. :-)
>>>
>>> If you want to write that extractor, we would be happy to include it
>>> in the extraction framework. Here are some instructions on how you can
>>> send a pull request on GitHub:
>>>
>>> https://github.com/dbpedia/extraction-framework/wiki/Contributing
>>>
>>> To keep things manageable and since this extractor is only applicable
>>> for the French Wikipedia edition, I would suggest you create a new
>>> package org.dbpedia.extraction.mappings.fr in
>>> extraction-framework/core/src/main/scala. Like many other extractors,
>>> this one doesn't really belong in the 'core' module, but the
>>> extraction framework is not yet very well modularized, so there's no
>>> better place.
>>>
>>> A minor addition: I guess we should change the syntax in the
>>> extraction config files: currently, all extractor class names that *do
>>> not contain a dot* are prefixed by "org.dbpedia.extraction.mappings.".
>>> Example: "AbstractExtractor" becomes
>>> "org.dbpedia.extraction.mappings.AbstractExtractor". If we change that
>>> rule and prefix all extractor class names that *start with a dot* by
>>> "org.dbpedia.extraction.mappings", then you could write
>>> ".fr.PopulationExtractor" in your extraction config file. With the
>>> current rule, you would have to write the whole class name
>>> "org.dbpedia.extraction.mappings.fr.PopulationExtractor". (Of course,
>>> with the new rule, we would have to add a dot to all extractor class
>>> names in all config files, but that's no big deal.)
>>>
>>> Cheers,
>>> JC
>>>
>>> On 21 April 2013 22:35, Julien Plu <[email protected]>
>>> wrote:
>>> > I thought to the same implementation than you Jona but a little bit
>>> > different. Here my steps :
>>> >
>>> > 1) Parse the XML file and retrieve all the data about these templates.
>>> For
>>> > example we see a tag "title" with this :
>>> >
>>> > Modèle:Données/Toulouse/évolution_population
>>> >
>>> > 2) Extract the last "an" and "pop" values
>>> > 3) Put in a file the triples :
>>> > <http://fr.dbpedia.org/resource/Toulouse>
>>> > <http://fr.dbpedia.org/property/population> number pop^^xsd:integer .
>>> > <http://fr.dbpedia.org/resource/Toulouse>
>>> > <http://fr.dbpedia.org/property/AnneePopulation> year^^xsd:date .
>>> >
>>> > And so on, for all these templates. What do you think ?
>>> >
>>> > I know it's not really generic but it's a good beginning to think
>>> after to a
>>> > generic solution.
>>> >
>>> > Best.
>>> >
>>> > Julien.
>>> >
>>> >
>>> > 2013/4/21 Jona Christopher Sahnwaldt <[email protected]>
>>> >>
>>> >> Good question. Short answer: No, DBpedia can't handle these templates,
>>> >> and it's hard to change that.
>>> >>
>>> >> It would be nice to do it in a generic way: design a system that
>>> >> allows users of the mappings wiki to add rules how such templates
>>> >> should be handled in a certain lanuage. Write Scala code that executes
>>> >> these rules and parses the template definitions (e.g.
>>> >> Modèle:Données/Toulouse/évolution_population) to extract the data and
>>> >> store it in memory or in an temporary file. Then during the main
>>> >> extraction, when you find a template call like {{Dernière population
>>> >> commune de France}}, get the data from storage and generate the
>>> >> appropriate triples.
>>> >>
>>> >> A major effort. Related to
>>> >> http://wiki.dbpedia.org/gsoc2013/ideas/CrowdsourceTestsAndRules , but
>>> >> even bigger.
>>> >>
>>> >> Maybe it would be easier to extend DBpedia such that the framework can
>>> >> "execute" template definitions.
>>> >>
>>> >> Maybe all that is a waste of time because the data will soon move to
>>> >> Wikidata. We just don't know how soon: Three months? Three years?
>>> >> Never?
>>> >>
>>> >> JC
>>> >>
>>> >> On 21 April 2013 22:04, Julien Plu <
>>> [email protected]>
>>> >> wrote:
>>> >> > Thanks Jona for these precisions :-)
>>> >> >
>>> >> > Another thing, I would like to know if the extraction framework can
>>> use
>>> >> > the
>>> >> > "data templates". I mean some properties values (in french
>>> wikipedia for
>>> >> > french Settlement) are now replaced by templates, for example :
>>> >> >
>>> >> > population = {{Dernière population commune de France}} <!-- {{Last
>>> >> > population french Settlement}} -->
>>> >> >
>>> >> > And this data is contained in this kind of pattern :
>>> >> >
>>> >> > http://fr.wikipedia.fr/wiki/Modèle:Données/Nom de
>>> >> > l'article/évolution_population
>>> >> >
>>> >> > In english :
>>> >> >
>>> >> > Template:Data/article name/evolution_population
>>> >> >
>>> >> > By example :
>>> >> >
>>> >> >
>>> http://fr.wikipedia.org/wiki/Modèle:Données/Toulouse/évolution_population
>>> >> >
>>> >> > It's always the same address pattern. And these templates look like
>>> this
>>> >> > :
>>> >> >
>>> >> > <includeonly>{{#switch: {{{1|}}}
>>> >> > |an1=1793|pop1=52612
>>> >> > |anX=year|popX=number
>>> >> > |an=last_year|pop=last_known_number}}</includeonly>
>>> >> >
>>> >> > These templates are in the XML dump.
>>> >> >
>>> >> > So it has been added in the extraction framework ? if no, what
>>> files I
>>> >> > have
>>> >> > to modify for including these kind of exceptions ?
>>> >> >
>>> >> > Best.
>>> >> >
>>> >> > Julien.
>>> >> >
>>> >> >
>>> >> > 2013/4/21 Jona Christopher Sahnwaldt <[email protected]>
>>> >> >>
>>> >> >> On 21 April 2013 19:38, Julien Plu
>>> >> >> <[email protected]>
>>> >> >> wrote:
>>> >> >> > Hi,
>>> >> >> >
>>> >> >> > An idea of what I do wrongly? (see my previous mail below)
>>> >> >> >
>>> >> >> > Best.
>>> >> >> >
>>> >> >> > Julien.
>>> >> >> >
>>> >> >> > From: Julien Plu <[email protected]>
>>> >> >> > Date: 2013/4/20
>>> >> >> > Subject: Problem with extracted data
>>> >> >> > To: "[email protected]"
>>> >> >> > <[email protected]>
>>> >> >> >
>>> >> >> >
>>> >> >> > Hi,
>>> >> >> >
>>> >> >> > After to have imported the extracted data into my virtuoso
>>> server I
>>> >> >> > could
>>> >> >> > see that I had some strange data. By example all my URI start
>>> with
>>> >> >> > "http://dbpedia.org"; and not with "http://fr.dbpedia.org"; and I
>>> don't
>>> >> >> > have
>>> >> >> > the "prop-fr" properties too, whereas I put "fr" in all the
>>> >> >> > extraction
>>> >> >> > properties file.
>>> >> >> >
>>> >> >> > I could see too, if I compare the data from the
>>> http://fr.dbpedia.org
>>> >> >> > and
>>> >> >> > mine they are not the same. By example if you compare these two
>>> >> >> > sparql
>>> >> >> > results :
>>> >> >> >
>>> >> >> > mine :
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> http://data.lirmm.fr:8890/sparql?default-graph-uri=&query=select+distinct+*+where+%7B%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FToulouse%3E+%3Fp+%3Fo%7D&should-sponge=&format=text%2Fhtml&timeout=0&debug=on
>>> >> >> >
>>> >> >> > fr.dbpedia.org :
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> http://fr.dbpedia.org/sparql?default-graph-uri=&query=select+distinct+*+where+%7B%3Chttp%3A%2F%2Ffr.dbpedia.org%2Fresource%2FToulouse%3E+%3Fp+%3Fo%7D&format=text%2Fhtml&timeout=0&debug=on
>>> >> >> >
>>> >> >> > In mine, I don't have the "http://www.w3.org/2002/07/owl#sameAs";
>>> or
>>> >> >>
>>> >> >> Do you mean the triples like http://www.w3.org/2002/07/owl#sameAs
>>> >> >> http://de.dbpedia.org/resource/Toulouse ? To get them, you would
>>> have
>>> >> >> to download Wikipedia dumps for several other languages, run
>>> >> >> InterlangueLinkExtractor on them, and then run
>>> >> >>
>>> >> >>
>>> >> >>
>>> https://github.com/dbpedia/extraction-framework/blob/master/scripts/src/main/scala/org/dbpedia/extraction/scripts/ProcessInterLanguageLinks.scala
>>> >> >> on all the result files.
>>> >> >>
>>> >> >> Or you could use the links in
>>> >> >>
>>> >> >>
>>> >> >>
>>> http://downloads.dbpedia.org/3.8/fr/interlanguage_links_same_as_chapters_fr.ttl.bz2
>>> >> >> or a similar file.
>>> >> >>
>>> >> >> > "http://fr.dbpedia.org/property/population"; properties among
>>> many
>>> >> >> > others.
>>> >> >> >
>>> >> >> > In attachment my extraction property file.
>>> >> >> >
>>> >> >> > What I did wrong ?
>>> >> >> >
>>> >> >> > Best.
>>> >> >> >
>>> >> >> > Julien.
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> ------------------------------------------------------------------------------
>>> >> >> > Precog is a next-generation analytics platform capable of
>>> advanced
>>> >> >> > analytics on semi-structured data. The platform includes APIs for
>>> >> >> > building
>>> >> >> > apps and a phenomenal toolset for data science. Developers can
>>> use
>>> >> >> > our toolset for easy data analysis & visualization. Get a free
>>> >> >> > account!
>>> >> >> > http://www2.precog.com/precogplatform/slashdotnewsletter
>>> >> >> > _______________________________________________
>>> >> >> > Dbpedia-discussion mailing list
>>> >> >> > [email protected]
>>> >> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>> >> >> >
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>>
>>
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to