To Julien :
>Just need to change settings in InfoboxExtractor (MinPropertyCount and
MinPercentageOfExplicitPropert
>yKeys)
What are the new values of these parameters ?
> Do you mean ImageExtractor ?
No, I mean really HomepageExtractor. If you open the "homepages.nt.gz" by
example you see only this kind of triples for example :
<http://fr.dbpedia.org/resource/Toulouse> <
http://xmlns.com/foaf/0.1/homepage> <http://www.toulouse.fr> .
And in mine you will have only this kind of triples :
<http://fr.dbpedia.org/resource/Toulouse> <
http://fr.dbpedia.org/property/population> "65465151584" .
Something like that. Do you understand better ?
To Dimitri :
Thank you for the hint I will look your extractor.
To Jona :
My bad for the name of the extractor I forgot to recompile the tool.
Best.
Julien.
2013/4/22 Dimitris Kontokostas <[email protected]>
> Hi,
>
> I created a new extractor a few days ago where we get all the templates
> used in a page
> Maybe this can help with Julien's approach
>
> https://github.com/dbpedia/extraction-framework/blob/master/core/src/main/scala/org/dbpedia/extraction/mappings/ArticleTemplatesExtractor.scala
>
> Cheers,
> Dimitris
>
>
> On Mon, Apr 22, 2013 at 3:37 PM, Julien Plu <
> [email protected]> wrote:
>
>> @Jona : If I create a new Scala class here :
>> "org.dbpedia.extraction.mappings.fr.PopulationExtractor.scala"
>>
>> And in if my extraction.default.properties file I write :
>> "org.dbpedia.extraction.mappings.fr.PopulationExtractor"
>>
>> I have a "ClassNotFound" Exception and my class extend "Extractor" and
>> has the same name than the file :-(
>>
>> Best.
>>
>> Julien.
>>
>>
>> 2013/4/22 Julien Plu <[email protected]>
>>
>>> Apparently your solution doesn't works because the template "
>>> Données/Toulouse/évolution_population<http://fr.wikipedia.org/wiki/Mod%C3%A8le:Donn%C3%A9es/Toulouse/%C3%A9volution_population>"
>>> doesn't appear in the among the "
>>> dbo:wikiPageUsesTemplate" property values :-(
>>>
>>>
>>> http://data.lirmm.fr/sparql/?default-graph-uri=&query=select+*+where+{%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FToulouse%3E+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2FwikiPageUsesTemplate%3E+%3Ft}&should-sponge=&format=text%2Fhtml&timeout=0&debug=on<http://data.lirmm.fr/sparql/?default-graph-uri=&query=select+*+where+%7B%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FToulouse%3E+%3Chttp%3A%2F%2Fdbpedia.org%2Fproperty%2FwikiPageUsesTemplate%3E+%3Ft%7D&should-sponge=&format=text%2Fhtml&timeout=0&debug=on>
>>>
>>> Best.
>>>
>>> Julien.
>>>
>>>
>>> 2013/4/22 Julien Plu <[email protected]>
>>>
>>>> Hi Julien,
>>>>
>>>>
>>>> >You will store data extracted from the templates pages and then insert
>>>> them when you parse the article page ?
>>>>
>>>> To answer at your question, no, it's not what we have in mind. It's
>>>> more like "HomepageExtractor" by example. Create a gz file with only the
>>>> population inside.
>>>>
>>>> But yes I think your solution can work too, need to test it :-)
>>>>
>>>> Best.
>>>>
>>>> Julien.
>>>>
>>>>
>>>> 2013/4/22 Julien Cojan <[email protected]>
>>>>
>>>>> Hi Julien, Jonas,
>>>>>
>>>>> I just saw your discussion bout externalised templates.
>>>>> For information, the property prop-fr:population appears on
>>>>> http://fr.dbpedia.org because the template
>>>>> Données/Toulouse/évolution_population<http://fr.wikipedia.org/wiki/Mod%C3%A8le:Donn%C3%A9es/Toulouse/%C3%A9volution_population>was
>>>>> not used when I did the last extraction.
>>>>>
>>>>>
>>>>> About the extractor you want to add, I am not sure I understood how
>>>>> you want to do.
>>>>> You will store data extracted from the templates pages and then insert
>>>>> them when you parse the article page ?
>>>>> So you need to run the extraction framework twice over the Wikipedia
>>>>> dump, the template page may appear after in the dump file.
>>>>>
>>>>> Wouldn't it be more generic to define some insert/delete SPARQL rules
>>>>> to handle this once the extraction process is over ?
>>>>> something like :
>>>>>
>>>>> insert {?s ?p ?v} where {?s dbo:wikiPageUsesTemplate ?t . ?t ?p ?v}
>>>>>
>>>>> then
>>>>>
>>>>> delete {?t ?p ?v} where {?s dbo:wikiPageUsesTemplate ?t . ?t ?p ?v}
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Julien C.
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> *De: *"Julien Plu" <[email protected]>
>>>>> *À: *"Jona Christopher Sahnwaldt" <[email protected]>
>>>>> *Cc: *[email protected]
>>>>> *Envoyé: *Lundi 22 Avril 2013 09:54:59
>>>>> *Objet: *Re: [Dbpedia-discussion] Problem with extracted data
>>>>>
>>>>>
>>>>> Ok, I will try to code this in a new package "fr" this week. I have
>>>>> just to see how to write an extractor and learning Scala :-D
>>>>>
>>>>> Best.
>>>>>
>>>>> Julien.
>>>>>
>>>>>
>>>>> 2013/4/22 Jona Christopher Sahnwaldt <[email protected]>
>>>>>
>>>>>> Good idea! It probably wouldn't be hard to write a specific extractor
>>>>>> for this. Maybe just a few dozen lines.
>>>>>>
>>>>>> Only problem is, we may soon have dozens or hundreds of such
>>>>>> specialized extractors. But we can deal with that. :-)
>>>>>>
>>>>>> If you want to write that extractor, we would be happy to include it
>>>>>> in the extraction framework. Here are some instructions on how you can
>>>>>> send a pull request on GitHub:
>>>>>>
>>>>>> https://github.com/dbpedia/extraction-framework/wiki/Contributing
>>>>>>
>>>>>> To keep things manageable and since this extractor is only applicable
>>>>>> for the French Wikipedia edition, I would suggest you create a new
>>>>>> package org.dbpedia.extraction.mappings.fr in
>>>>>> extraction-framework/core/src/main/scala. Like many other extractors,
>>>>>> this one doesn't really belong in the 'core' module, but the
>>>>>> extraction framework is not yet very well modularized, so there's no
>>>>>> better place.
>>>>>>
>>>>>> A minor addition: I guess we should change the syntax in the
>>>>>> extraction config files: currently, all extractor class names that *do
>>>>>> not contain a dot* are prefixed by "org.dbpedia.extraction.mappings.".
>>>>>> Example: "AbstractExtractor" becomes
>>>>>> "org.dbpedia.extraction.mappings.AbstractExtractor". If we change that
>>>>>> rule and prefix all extractor class names that *start with a dot* by
>>>>>> "org.dbpedia.extraction.mappings", then you could write
>>>>>> ".fr.PopulationExtractor" in your extraction config file. With the
>>>>>> current rule, you would have to write the whole class name
>>>>>> "org.dbpedia.extraction.mappings.fr.PopulationExtractor". (Of course,
>>>>>> with the new rule, we would have to add a dot to all extractor class
>>>>>> names in all config files, but that's no big deal.)
>>>>>>
>>>>>> Cheers,
>>>>>> JC
>>>>>>
>>>>>> On 21 April 2013 22:35, Julien Plu <
>>>>>> [email protected]> wrote:
>>>>>> > I thought to the same implementation than you Jona but a little bit
>>>>>> > different. Here my steps :
>>>>>> >
>>>>>> > 1) Parse the XML file and retrieve all the data about these
>>>>>> templates. For
>>>>>> > example we see a tag "title" with this :
>>>>>> >
>>>>>> > Modèle:Données/Toulouse/évolution_population
>>>>>> >
>>>>>> > 2) Extract the last "an" and "pop" values
>>>>>> > 3) Put in a file the triples :
>>>>>> > <http://fr.dbpedia.org/resource/Toulouse>
>>>>>> > <http://fr.dbpedia.org/property/population> number
>>>>>> pop^^xsd:integer .
>>>>>> > <http://fr.dbpedia.org/resource/Toulouse>
>>>>>> > <http://fr.dbpedia.org/property/AnneePopulation> year^^xsd:date .
>>>>>> >
>>>>>> > And so on, for all these templates. What do you think ?
>>>>>> >
>>>>>> > I know it's not really generic but it's a good beginning to think
>>>>>> after to a
>>>>>> > generic solution.
>>>>>> >
>>>>>> > Best.
>>>>>> >
>>>>>> > Julien.
>>>>>> >
>>>>>> >
>>>>>> > 2013/4/21 Jona Christopher Sahnwaldt <[email protected]>
>>>>>> >>
>>>>>> >> Good question. Short answer: No, DBpedia can't handle these
>>>>>> templates,
>>>>>> >> and it's hard to change that.
>>>>>> >>
>>>>>> >> It would be nice to do it in a generic way: design a system that
>>>>>> >> allows users of the mappings wiki to add rules how such templates
>>>>>> >> should be handled in a certain lanuage. Write Scala code that
>>>>>> executes
>>>>>> >> these rules and parses the template definitions (e.g.
>>>>>> >> Modèle:Données/Toulouse/évolution_population) to extract the data
>>>>>> and
>>>>>> >> store it in memory or in an temporary file. Then during the main
>>>>>> >> extraction, when you find a template call like {{Dernière
>>>>>> population
>>>>>> >> commune de France}}, get the data from storage and generate the
>>>>>> >> appropriate triples.
>>>>>> >>
>>>>>> >> A major effort. Related to
>>>>>> >> http://wiki.dbpedia.org/gsoc2013/ideas/CrowdsourceTestsAndRules ,
>>>>>> but
>>>>>> >> even bigger.
>>>>>> >>
>>>>>> >> Maybe it would be easier to extend DBpedia such that the framework
>>>>>> can
>>>>>> >> "execute" template definitions.
>>>>>> >>
>>>>>> >> Maybe all that is a waste of time because the data will soon move
>>>>>> to
>>>>>> >> Wikidata. We just don't know how soon: Three months? Three years?
>>>>>> >> Never?
>>>>>> >>
>>>>>> >> JC
>>>>>> >>
>>>>>> >> On 21 April 2013 22:04, Julien Plu <
>>>>>> [email protected]>
>>>>>> >> wrote:
>>>>>> >> > Thanks Jona for these precisions :-)
>>>>>> >> >
>>>>>> >> > Another thing, I would like to know if the extraction framework
>>>>>> can use
>>>>>> >> > the
>>>>>> >> > "data templates". I mean some properties values (in french
>>>>>> wikipedia for
>>>>>> >> > french Settlement) are now replaced by templates, for example :
>>>>>> >> >
>>>>>> >> > population = {{Dernière population commune de France}} <!--
>>>>>> {{Last
>>>>>> >> > population french Settlement}} -->
>>>>>> >> >
>>>>>> >> > And this data is contained in this kind of pattern :
>>>>>> >> >
>>>>>> >> > http://fr.wikipedia.fr/wiki/Modèle:Données/Nom de
>>>>>> >> > l'article/évolution_population
>>>>>> >> >
>>>>>> >> > In english :
>>>>>> >> >
>>>>>> >> > Template:Data/article name/evolution_population
>>>>>> >> >
>>>>>> >> > By example :
>>>>>> >> >
>>>>>> >> >
>>>>>> http://fr.wikipedia.org/wiki/Modèle:Données/Toulouse/évolution_population
>>>>>> >> >
>>>>>> >> > It's always the same address pattern. And these templates look
>>>>>> like this
>>>>>> >> > :
>>>>>> >> >
>>>>>> >> > <includeonly>{{#switch: {{{1|}}}
>>>>>> >> > |an1=1793|pop1=52612
>>>>>> >> > |anX=year|popX=number
>>>>>> >> > |an=last_year|pop=last_known_number}}</includeonly>
>>>>>> >> >
>>>>>> >> > These templates are in the XML dump.
>>>>>> >> >
>>>>>> >> > So it has been added in the extraction framework ? if no, what
>>>>>> files I
>>>>>> >> > have
>>>>>> >> > to modify for including these kind of exceptions ?
>>>>>> >> >
>>>>>> >> > Best.
>>>>>> >> >
>>>>>> >> > Julien.
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > 2013/4/21 Jona Christopher Sahnwaldt <[email protected]>
>>>>>> >> >>
>>>>>> >> >> On 21 April 2013 19:38, Julien Plu
>>>>>> >> >> <[email protected]>
>>>>>> >> >> wrote:
>>>>>> >> >> > Hi,
>>>>>> >> >> >
>>>>>> >> >> > An idea of what I do wrongly? (see my previous mail below)
>>>>>> >> >> >
>>>>>> >> >> > Best.
>>>>>> >> >> >
>>>>>> >> >> > Julien.
>>>>>> >> >> >
>>>>>> >> >> > From: Julien Plu <[email protected]>
>>>>>> >> >> > Date: 2013/4/20
>>>>>> >> >> > Subject: Problem with extracted data
>>>>>> >> >> > To: "[email protected]"
>>>>>> >> >> > <[email protected]>
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> > Hi,
>>>>>> >> >> >
>>>>>> >> >> > After to have imported the extracted data into my virtuoso
>>>>>> server I
>>>>>> >> >> > could
>>>>>> >> >> > see that I had some strange data. By example all my URI start
>>>>>> with
>>>>>> >> >> > "http://dbpedia.org" and not with "http://fr.dbpedia.org"
>>>>>> and I don't
>>>>>> >> >> > have
>>>>>> >> >> > the "prop-fr" properties too, whereas I put "fr" in all the
>>>>>> >> >> > extraction
>>>>>> >> >> > properties file.
>>>>>> >> >> >
>>>>>> >> >> > I could see too, if I compare the data from the
>>>>>> http://fr.dbpedia.org
>>>>>> >> >> > and
>>>>>> >> >> > mine they are not the same. By example if you compare these
>>>>>> two
>>>>>> >> >> > sparql
>>>>>> >> >> > results :
>>>>>> >> >> >
>>>>>> >> >> > mine :
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> http://data.lirmm.fr:8890/sparql?default-graph-uri=&query=select+distinct+*+where+%7B%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FToulouse%3E+%3Fp+%3Fo%7D&should-sponge=&format=text%2Fhtml&timeout=0&debug=on
>>>>>> >> >> >
>>>>>> >> >> > fr.dbpedia.org :
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> http://fr.dbpedia.org/sparql?default-graph-uri=&query=select+distinct+*+where+%7B%3Chttp%3A%2F%2Ffr.dbpedia.org%2Fresource%2FToulouse%3E+%3Fp+%3Fo%7D&format=text%2Fhtml&timeout=0&debug=on
>>>>>> >> >> >
>>>>>> >> >> > In mine, I don't have the "
>>>>>> http://www.w3.org/2002/07/owl#sameAs" or
>>>>>> >> >>
>>>>>> >> >> Do you mean the triples like
>>>>>> http://www.w3.org/2002/07/owl#sameAs
>>>>>> >> >> http://de.dbpedia.org/resource/Toulouse ? To get them, you
>>>>>> would have
>>>>>> >> >> to download Wikipedia dumps for several other languages, run
>>>>>> >> >> InterlangueLinkExtractor on them, and then run
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> https://github.com/dbpedia/extraction-framework/blob/master/scripts/src/main/scala/org/dbpedia/extraction/scripts/ProcessInterLanguageLinks.scala
>>>>>> >> >> on all the result files.
>>>>>> >> >>
>>>>>> >> >> Or you could use the links in
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> http://downloads.dbpedia.org/3.8/fr/interlanguage_links_same_as_chapters_fr.ttl.bz2
>>>>>> >> >> or a similar file.
>>>>>> >> >>
>>>>>> >> >> > "http://fr.dbpedia.org/property/population" properties among
>>>>>> many
>>>>>> >> >> > others.
>>>>>> >> >> >
>>>>>> >> >> > In attachment my extraction property file.
>>>>>> >> >> >
>>>>>> >> >> > What I did wrong ?
>>>>>> >> >> >
>>>>>> >> >> > Best.
>>>>>> >> >> >
>>>>>> >> >> > Julien.
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> ------------------------------------------------------------------------------
>>>>>> >> >> > Precog is a next-generation analytics platform capable of
>>>>>> advanced
>>>>>> >> >> > analytics on semi-structured data. The platform includes APIs
>>>>>> for
>>>>>> >> >> > building
>>>>>> >> >> > apps and a phenomenal toolset for data science. Developers
>>>>>> can use
>>>>>> >> >> > our toolset for easy data analysis & visualization. Get a free
>>>>>> >> >> > account!
>>>>>> >> >> > http://www2.precog.com/precogplatform/slashdotnewsletter
>>>>>> >> >> > _______________________________________________
>>>>>> >> >> > Dbpedia-discussion mailing list
>>>>>> >> >> > [email protected]
>>>>>> >> >> >
>>>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>>>> >> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Precog is a next-generation analytics platform capable of advanced
>>>>> analytics on semi-structured data. The platform includes APIs for
>>>>> building
>>>>> apps and a phenomenal toolset for data science. Developers can use
>>>>> our toolset for easy data analysis & visualization. Get a free account!
>>>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>>>>> _______________________________________________
>>>>> Dbpedia-discussion mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Precog is a next-generation analytics platform capable of advanced
>> analytics on semi-structured data. The platform includes APIs for building
>> apps and a phenomenal toolset for data science. Developers can use
>> our toolset for easy data analysis & visualization. Get a free account!
>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>>
>
>
> --
> Kontokostas Dimitris
>
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion