Hi Mariano, I'm not sure I understand correctly what you mean. I guess statistical approach could mean two things:
1. heuristically figure out which format is used on a Wikipedia edition and use that format to parse all values 2. heuristically figure out which format is used for a certain template property and use that format to parse values for this property I don't think either would work well. For example, most numbers in the source text of http://pt.wikipedia.org/wiki/Portugal use comma as decimal separator (as Portuguese texts usually do), but most values in the source text of http://pt.wikipedia.org/wiki/Rio_Rufino use the dot. So we can't use one separator for all pages of a language, we have to treat specific properties differently. And I don't think there is a good way to figure out which format a property uses. In this case, we would have to figure out that Rio Rufino has an area of about 282 km², not ~ 282000 km². We might use a heuristic like 'cities usually have an area of less than 10000 km²', but such a heuristic might fail for either very large or very small cities, and we would have to introduce all kinds of different heuristics. I think it's much simpler to allow the editors of the mappings wiki to specify that a certain template property uses a format that differs from the main one for this Wikipedia edition. Cheers, JC On Wed, May 30, 2012 at 4:57 PM, Mariano Rico <[email protected]> wrote: >> >> Any other ideas? >> > > What about an 'statistical' approach? Most people will type number in their > locale format, and the common pitfall is to use the English format. > If the number format is correct English, and the statics say that most > numbers are xx format, the number could be converted to the local format by > using a conversion function. Every language has a number extractor to parse > numbers in their locale, we could add this conversion function. > > -Mariano ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
