Hi all,
First thing, thanks to Zsíros for pointing out the error, to the DBpedia
co-founder Sören for his quick response - can we assign bugs to you too? :P
- and to our i18n pioneer Dimitris for looking deeper into the issue.
Dimitris has a point there. That is not a valid number. However, maybe we
shouldn't say that there is no problem with the parser.
I tried the query below on http://dbpedia.org/sparql
select ?outlier ?pop
where {{
?outlier a dbpedia-owl:PopulatedPlace .
?outlier dbpprop:populationTotal ?pop .
FILTER regex(?pop, "[^0-9]+[0-9]+")
}
union
{
?outlier a dbpedia-owl:PopulatedPlace .
?outlier dbpprop:populationTotal ?pop .
FILTER regex(?pop, "[0-9]+[^0-9]+")
}}
It returns more than 1000 results where there are characters within what
should be a numeric field. This exemplifies the (messy) nature of the data
we're dealing with. It should also give you an idea of how hard it is to
get the parsing right:
http://dbpedia.org/resource/Harlow
http://dbpedia.org/resource/List_of_English_districts_by_population
http://dbpedia.org/resource/Beetgum"c. 735"@en
http://dbpedia.org/resource/Conakry"6.9522624E9"^^<
http://dbpedia.org/datatype/second>
http://dbpedia.org/resource/Varadero"aprox. 20000"@en
For many of these resources, the property is simply not there.
The parser does already a great job, but it has much to improve. These
problems are super tough to solve in a generic way. But our research is
progressing in that direction, and between the groups active in this
community I'm sure many good solutions will pop up.
Best,
Pablo
On Mon, Nov 7, 2011 at 8:13 PM, Dimitris Kontokostas <[email protected]>wrote:
> Hi,
>
> There is no problem with the parser, the number has a space between 74 and
> 544 (74*_*544), so it is not a valid number.
> the dbpedia-owl tries to validate the value to a number so it gets 74
> the dbpprop does not validate values and accepts them as text (74 544)
>
> However, I spotted a problem with the live extraction.
> I changed the article to the correct number but the following happened:
> 1) the
> dbpprop:populationTotal<http://live.dbpedia.org/property/populationTotal>kept
> both the new and previous value (
> 74544 & 74 544)
> 2) the
> dbpedia-owl:populationTotal<http://live.dbpedia.org/ontology/populationTotal>
> remained
> 74 (it did not change to 74544)
>
> (http://live.dbpedia.org/page/Szolnok)
>
> Cheers,
> Dimitris
>
>
> On Mon, Nov 7, 2011 at 2:43 PM, Sören Auer <[email protected]
> > wrote:
>
>> Dear Zsíros,
>>
>> Indeed this seems to be a parser problem. Interestingly
>> http://live.dbpedia.org/page/Szolnok has at least a better value for
>> dbpprop:populationTotal (74 544).
>> I'm CCing the DBpedia mailinglist, since there might be people able to
>> help there. I will also discuss this issue with my colleagues working
>> here on DBpedia.
>>
>> Sören
>>
>>
>> Am 07.11.2011 13:35, schrieb Levente Zsíros:
>> > Hello!
>> > The city Szolnok ( http://en.wikipedia.org/wiki/Szolnok ) in DBpedia
>> has
>> > a population 74, which is wrong. On the other hand Wikipedia has the
>> > correct value. Isn't DBpedia supposed to be in sync with Wikipedia? Or
>> > is your wiki-parser faulty?
>> >
>> > http://dbpedia.org/page/Szolnok dbpprop:populationTotal
>> > <http://dbpedia.org/property/populationTotal>
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Zsíros Levente
>>
>>
>>
>> ------------------------------------------------------------------------------
>> RSA(R) Conference 2012
>> Save $700 by Nov 18
>> Register now
>> http://p.sf.net/sfu/rsa-sfdev2dev1
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>
>
>
> --
> Kontokostas Dimitris
>
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Save $700 by Nov 18
> Register now
> http://p.sf.net/sfu/rsa-sfdev2dev1
> _______________________________________________
> Dbpedia-developers mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>
>
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion