Hi María,

that's the problem we discussed last week. We basically know how to
handle this problem, I just don't know yet when we will have the time
to implement the solution.

The Wikipedia page uses '.' as the decimal separator - not in the HTML
view, but in the wikitext source:

http://pt.wikipedia.org/wiki/Rio_Rufino?action=edit
| área                 = 282.569

Because '.' is usually the thousands separator in Portuguese, we
extract 282569.0 km² and 2.82569E11 m² for Rio Rufino:

http://mappings.dbpedia.org/server/extraction/pt/extract?title=Rio_Rufino

It's quite possible that there are other kinds of errors in the
DBpedia extraction, so if you find other mistakes let us know. Please
check the wikitext source of a Wikipedia page. Often, the cause of the
error can be found there.

JC

On Fri, Jun 8, 2012 at 11:48 AM, María Poveda <mpov...@fi.upm.es> wrote:
> Hi Christopher,
>
> .- from http://pt.wikipedia.org/wiki/Rio_Rufino the area total is 282,569
> km²
> .- in the dbpedia dataset the areatotal is: 2.82569E11 m2 (from the 3.7 pt
> dbpedia version)
>
> If I'm not wrong it should be  2.82569E8 m2.
>
> María
>
> On Thu, Jun 7, 2012 at 5:18 PM, Jona Christopher Sahnwaldt <j...@sahnwaldt.de>
> wrote:
>>
>> Hi Maria,
>>
>> values for http://dbpedia.org/ontology/areaTotal should indeed use
>> square metres (the SI unit) because it's easier to compare and order
>> across many different resources.
>>
>> We also have the "specific" property
>> http://dbpedia.org/ontology/PopulatedPlace/areaTotal which uses square
>> kilometres because that's what's conventionally used for populated
>> places.
>>
>> For example, http://dbpedia.org/page/Berlin contains
>>
>> dbpedia-owl:PopulatedPlace/areaTotal 891.85
>> dbpedia-owl:areaTotal 891850000.000000
>>
>> I just looked at three or four other pages and they look ok to me. Do
>> you have an example where dbpedia-owl:areaTotal uses square
>> kilometres?
>>
>> Cheers,
>> JC
>>
>> On Wed, Jun 6, 2012 at 5:46 PM, María Poveda <mpov...@fi.upm.es> wrote:
>> > Hello all,
>> >
>> >    I've just seen that according to the ontology the areaTotal
>> > (http://dbpedia.org/ontology/areaTotal) should be represented in m2. I
>> > think
>> > most of the values of the 3.7 version are km2.
>> >
>> > Here you can either extract the values according to this definition or
>> > change the ontology? It could make sense to extract it as km2 as the
>> > areas
>> > of populated places are usually quite big to be expressed in meters.
>> >
>> > María
>> >
>> > On Fri, Jun 1, 2012 at 7:01 PM, Jona Christopher Sahnwaldt
>> > <j...@sahnwaldt.de>
>> > wrote:
>> >>
>> >> OK, then I'll put this on my TODO list. I don't think this feature
>> >> will make it into the 3.8 release though, as the main extractions are
>> >> already done.
>> >>
>> >> On Fri, Jun 1, 2012 at 6:20 PM, Pablo Mendes <pablomen...@gmail.com>
>> >> wrote:
>> >> >
>> >> > This sounds good to me. Marking something to "be parsed as you'd
>> >> > parse
>> >> > english numbers" is easy enough to get.
>> >> >
>> >> > Cheers,
>> >> > Pablo
>> >> >
>> >> >
>> >> > On Fri, Jun 1, 2012 at 6:06 PM, Jona Christopher Sahnwaldt
>> >> > <j...@sahnwaldt.de>
>> >> > wrote:
>> >> >>
>> >> >> I think I tend towards the language code solution. It feels safer to
>> >> >> use a different locale altogether than to modifiy separate aspects
>> >> >> of
>> >> >> a complex beast like a number format.
>> >> >>
>> >> >> Advantages:
>> >> >>
>> >> >> + Relatively simple to implement: add Locale constructor parameter
>> >> >> in
>> >> >> ParserUtils, use it when getting NumberFormat / DecimalFormat. Four
>> >> >> data parser and three mapping classes will also need this
>> >> >> constructor
>> >> >> parameter. The MappingsLoader will have to parse the mappings wiki
>> >> >> settings. For the other solutions, we would either need two (or
>> >> >> maybe
>> >> >> three) instead of one parameter, or construct our own NumberFormat
>> >> >> from the decimalSeparator and groupingSeparator (and possibly other)
>> >> >> property settings.
>> >> >> + Compact setting: one setting like "en" changes all separators, not
>> >> >> just the decimalSeparator. If decimalSeparator and groupingSeparator
>> >> >> can be set separately, editors will probably forget one or the
>> >> >> other,
>> >> >> which will lead to problems. We could add implicit rules like "if
>> >> >> the
>> >> >> editor set the decimalSeparator to comma but no groupingSeparator,
>> >> >> set
>> >> >> groupingSeparator to dot" and vice versa, which is ugly and
>> >> >> confusing.
>> >> >>
>> >> >> Disadvantages:
>> >> >>
>> >> >> - Possibly more difficult for mappings wiki editors. The difference
>> >> >> between comma and dot is plain to see, but that these characters are
>> >> >> connected to a thing called 'locale' is less obvious.
>> >> >> - Less flexible. If there ever is a property that uses
>> >> >> decimalSeparator, groupingSeparator and maybe other settings not
>> >> >> covered by any Locale, we're in trouble.
>> >> >>
>> >> >> I think (but we should collect hard data about this) that there are
>> >> >> mainly two cases: numbers that use comma as decimalSeparator (and
>> >> >> dot
>> >> >> or space as groupingSeparator), or vice versa: dot as
>> >> >> decimalSeparator
>> >> >> (and comma or space as groupingSeparator). Most template properties
>> >> >> in
>> >> >> a Wikipedia language edition should use the format that is
>> >> >> 'canonical'
>> >> >> for that language. The others will probably use the 'English'
>> >> >> format.
>> >> >>
>> >> >> Space as groupingSeparator is already allowed for all languages.
>> >> >> Java
>> >> >> doesn't support it, but I recently added that behavior to
>> >> >> ParserUtils.parse().
>> >> >>
>> >> >> Regards,
>> >> >> JC
>> >> >>
>> >> >> PS: of course, there are exceptions: I just found out that in the
>> >> >> German Wikipedia, articles that start with <!--schweizbezogen-->
>> >> >> ('relating to switzerland') use the apostrophe ' as thousand
>> >> >> separator. Oh my... :-(
>> >> >>
>> >> >>
>> >> >> On Wed, May 30, 2012 at 5:30 PM, Pablo Mendes
>> >> >> <pablomen...@gmail.com>
>> >> >> wrote:
>> >> >> >> Add a configuration value decimalSeparator whose value may be dot
>> >> >> >> or
>> >> >> >> comma: "," or ".". Bit hard to read... We would also need a
>> >> >> >> configuration value groupSeparator.
>> >> >> >
>> >> >> >
>> >> >> > +1 to this. Accepted values:
>> >> >> > - "dot" or "."
>> >> >> > - "comma" or ","
>> >> >> > - "space" or " "
>> >> >> > (it is the case that groupSeparators are spaces sometimes)
>> >> >> >
>> >> >> > Cheers,
>> >> >> > Pablo
>> >> >> >
>> >> >> > On Wed, May 30, 2012 at 4:29 PM, Jona Christopher Sahnwaldt
>> >> >> > <j...@sahnwaldt.de> wrote:
>> >> >> >>
>> >> >> >> Hi Maria,
>> >> >> >>
>> >> >> >> thanks for the report!
>> >> >> >>
>> >> >> >> The problem is that the number is displayed with a comma as the
>> >> >> >> decimal separator, but in the source text of the page [1], the
>> >> >> >> decimal
>> >> >> >> separator is a dot:
>> >> >> >>
>> >> >> >> | área                 = 282.569
>> >> >> >>
>> >> >> >> The template [2] that generates the HTML from the source expects
>> >> >> >> the
>> >> >> >> number to use a dot and formats it appropriately for
>> >> >> >> Brazilian/Portuguese:
>> >> >> >>
>> >> >> >> {{formatnum:{{{área}}}}}
>> >> >> >>
>> >> >> >> To fix this problem, we will have to extend our extraction
>> >> >> >> framework,
>> >> >> >> so that users can specify which decimal separator is used in the
>> >> >> >> values of a certain template property.
>> >> >> >>
>> >> >> >> @developers: We will have to discuss what's the best way to do
>> >> >> >> this...
>> >> >> >>
>> >> >> >> - Add a configuration value decimalSeparator whose value may be
>> >> >> >> dot
>> >> >> >> or
>> >> >> >> comma: "," or ".". Bit hard to read... We would also need a
>> >> >> >> configuration value groupSeparator.
>> >> >> >>
>> >> >> >> - Add a configuration value numberFormat that takes a language
>> >> >> >> code,
>> >> >> >> in this case "en".
>> >> >> >>
>> >> >> >> - Add a configuration value numberFormat that takes a decimal
>> >> >> >> separator and a group separator: ".,". Bit hard to read...
>> >> >> >>
>> >> >> >> Any other ideas?
>> >> >> >>
>> >> >> >> JC
>> >> >> >>
>> >> >> >> [1]
>> >> >> >> http://pt.wikipedia.org/w/index.php?title=Rio_Rufino&action=edit
>> >> >> >> [2]
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> http://pt.wikipedia.org/w/index.php?title=Predefinição:Info/Município_do_Brasil&action=edit
>> >> >> >> [3]
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> http://mappings.dbpedia.org/index.php/Mapping_pt:Info/Município_do_Brasil#.C3.A1rea
>> >> >> >>
>> >> >> >> On Wed, May 30, 2012 at 1:06 PM, María Poveda <mpov...@fi.upm.es>
>> >> >> >> wrote:
>> >> >> >> > Hello everybody,
>> >> >> >> >
>> >> >> >> >   I was having a look at DBpedia data about cities as for
>> >> >> >> > example
>> >> >> >> > the
>> >> >> >> > area
>> >> >> >> > total property. I would like to know how do you deal with
>> >> >> >> > different
>> >> >> >> > decimal
>> >> >> >> > separators and grouping separators between countries. For
>> >> >> >> > example
>> >> >> >> > I
>> >> >> >> > found
>> >> >> >> > that in  http://pt.wikipedia.org/wiki/Rio_Rufino the area total
>> >> >> >> > is
>> >> >> >> > 282,569
>> >> >> >> > km² and I think the "," is a decimal separator according to the
>> >> >> >> > Brazilian
>> >> >> >> > convention [1] . However in the DBpedia dataset I found the
>> >> >> >> > following
>> >> >> >> > value:
>> >> >> >> > 2.82569E11. My guess is that separator is being used as
>> >> >> >> > grouping
>> >> >> >> > separator
>> >> >> >> > as it is the convention in United Kingdom [2] for example.
>> >> >> >> >
>> >> >> >> > I would be very thankful if you can enlighten me.
>> >> >> >> >
>> >> >> >> > Cheers,
>> >> >> >> >
>> >> >> >> > María
>> >> >> >> >
>> >> >> >> > [1]
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > http://publib.boulder.ibm.com/infocenter/forms/v3r0m0/topic/com.ibm.help.forms.doc/locale_spec/i_xfdl_r_formats_pt_BR.html
>> >> >> >> > [2]
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > http://publib.boulder.ibm.com/infocenter/forms/v3r0m0/topic/com.ibm.help.forms.doc/locale_spec/i_xfdl_r_formats_en_GB.html
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > ------------------------------------------------------------------------------
>> >> >> >> > Live Security Virtual Conference
>> >> >> >> > Exclusive live event will cover all the ways today's security
>> >> >> >> > and
>> >> >> >> > threat landscape has changed and how IT managers can respond.
>> >> >> >> > Discussions
>> >> >> >> > will include endpoint security, mobile security and the latest
>> >> >> >> > in
>> >> >> >> > malware
>> >> >> >> > threats.
>> >> >> >> > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> >> >> > _______________________________________________
>> >> >> >> > Dbpedia-discussion mailing list
>> >> >> >> > Dbpedia-discussion@lists.sourceforge.net
>> >> >> >> > https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------------------------
>> >> >> >> Live Security Virtual Conference
>> >> >> >> Exclusive live event will cover all the ways today's security and
>> >> >> >> threat landscape has changed and how IT managers can respond.
>> >> >> >> Discussions
>> >> >> >> will include endpoint security, mobile security and the latest in
>> >> >> >> malware
>> >> >> >> threats.
>> >> >> >> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> >> >> _______________________________________________
>> >> >> >> Dbpedia-developers mailing list
>> >> >> >> dbpedia-develop...@lists.sourceforge.net
>> >> >> >> https://lists.sourceforge.net/lists/listinfo/dbpedia-developers
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >
>> >> >
>> >
>> >
>
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to