Hi Mariano,

I'm not sure I understand correctly what you mean. I guess statistical
approach could mean two things:

1. heuristically figure out which format is used on a Wikipedia
edition and use that format to parse all values
2. heuristically figure out which format is used for a certain
template property and use that format to parse values for this
property

I don't think either would work well. For example, most numbers in the
source text of http://pt.wikipedia.org/wiki/Portugal use comma as
decimal separator (as Portuguese texts usually do), but most values in
the source text of http://pt.wikipedia.org/wiki/Rio_Rufino use the
dot. So we can't use one separator for all pages of a language, we
have to treat specific properties differently. And I don't think there
is a good way to figure out which format a property uses. In this
case, we would have to figure out that Rio Rufino has an area of about
282 km², not ~ 282000 km². We might use a heuristic like 'cities
usually have an area of less than 10000 km²', but such a heuristic
might fail for either very large or very small cities, and we would
have to introduce all kinds of different heuristics.

I think it's much simpler to allow the editors of the mappings wiki to
specify that a certain template property uses a format that differs
from the main one for this Wikipedia edition.

Cheers,
JC

On Wed, May 30, 2012 at 4:57 PM, Mariano Rico <[email protected]> wrote:
>>
>> Any other ideas?
>>
>
> What about an 'statistical' approach? Most people will type number in their
> locale format, and the common pitfall is to use the English format.
> If the number format is correct English, and the statics say that most
> numbers are xx format, the number could be converted to the local format by
> using a conversion function. Every language has a number extractor to parse
> numbers in their locale, we could add this conversion function.
>
> -Mariano

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to