Re: [Wikidata-l] Data values

2012-12-23 Thread Marco Fleckinger
Hi, Denny Vrandečić denny.vrande...@wikimedia.de schrieb: * Wikidata has to balance ease of use and expressiveness of statements. The user interface should not get complicated to merely cover a few exceptional edge cases. So your test UI, had it been a proposal or just anything to have to

Re: [Wikidata-l] Data values

2012-12-21 Thread Marco Fleckinger
Gregor Hagedorn g.m.haged...@gmail.com schrieb: So, please suggest terms to use for at least these two things: 1) value certainty (ideally, not using digits, but something that is independent of unit and rendering) Here we want to talk about something that the true value is with a certain

Re: [Wikidata-l] Data values

2012-12-21 Thread Daniel Kinzler
On 20.12.2012 20:52, Gregor Hagedorn wrote: I believe there are a lot of dangerous assumptions on http://simia.net/valueparser/ First: there is no indication in a number that it is _not_ endlessly precise. Apostles = 12 has no uncertainty, representing it as 12 ± 1 is wrong, but also 12

Re: [Wikidata-l] Data values

2012-12-21 Thread Daniel Kinzler
On 20.12.2012 20:31, Friedrich Röhrs wrote: Hi, tried to enter the height of the eiffel tower. 324 meters. It suggested 324m +-100m. That's strange. When I enter 324m, it correctly suggests 324m+/-1 for me. -- daniel -- Daniel Kinzler, Softwarearchitekt Wikimedia Deutschland -

Re: [Wikidata-l] Data values

2012-12-21 Thread Daniel Kinzler
On 20.12.2012 20:59, Avenue wrote: Thanks, the prototype helps make some this more concrete. I am increasingly wondering if uncertainty will be overloaded here. People seem to want to use it for various types of measurement uncertainty (e.g. the standard error), ranges with no defined

Re: [Wikidata-l] Data values

2012-12-21 Thread Friedrich Röhrs
Hi, Does for me too now. Maybe i played around with the autouncertainty checkbox before trying 324 (probably had some 4 digit value before). Although the Problem remains, the height of the Eifeltower is not 324 +-1 meter. It is given as 324meters without any further information. So it should

Re: [Wikidata-l] Data values

2012-12-21 Thread Gregor Hagedorn
Hm the second one is only relevant for output. I think this is a fundamental misunderstanding: The original one is not for output but is the primary value for interpretation, for understanding whether a value in Wikidata is correct of fake, or a software conversion error, or what. If I want to

Re: [Wikidata-l] Data values

2012-12-21 Thread Gregor Hagedorn
I don't like significant digits because it depends on the writing system (base 10). I'd much rather express this as absolute values. Yes, I would like too. What I argue is that the problem is that you simply in 99.9 % (not a researched of number of course) of cases simply don't know more

Re: [Wikidata-l] Data values

2012-12-21 Thread Denny Vrandečić
Hi all, wow! Thanks for all the input. I read it all through, and am trying to digest it currently into a new draft of the data model for the discussed data values. I will try to adress some questions here. Please be kind if I refer the wrong person at one place or the other. Whenever I refer to

Re: [Wikidata-l] Data values

2012-12-21 Thread jmcclure
(if i knew the private email for Denny, I'd send this there) Martynas, there is no mention here of XSD etc. because it is not relevant on this level of discussion. For exporting the data we will obviously use XSD datatypes. This is so obvious that I didn't think it needed to be explicitly

Re: [Wikidata-l] Data values

2012-12-21 Thread jmcclure
The xsd:minInclusive, xsd:maxInclusive, xsd:minExclusive and xsd:maxExclusive facets are absolute expressions not relative +/- expressions, in order to accommodate fast queries. These four facets permit specification of ranges with an unspecified median and ranges with a specified mode,

Re: [Wikidata-l] Data values

2012-12-21 Thread Gregor Hagedorn
On 21 December 2012 19:36, jmccl...@hypergrove.com wrote: The xsd:minInclusive, xsd:maxInclusive, xsd:minExclusive and xsd:maxExclusive facets are absolute expressions not relative +/- expressions, in order to accommodate fast queries. These four facets permit specification of ranges with an

Re: [Wikidata-l] Data values

2012-12-21 Thread jmcclure
I detect a need to characterize the range expression - most important of which is whether the range is complete, or whether it excludes (equal) tails on each end. XSD presumes a complete range is being specified, not a subset, is the issue you're raising? Could an additional facet for

Re: [Wikidata-l] Data values

2012-12-21 Thread Friedrich Röhrs
Hi, On Fri, Dec 21, 2012 at 6:14 PM, Denny Vrandečić denny.vrande...@wikimedia.de wrote: Friedrich, the term query answering simply means the ability to answer queries against the database in Phase 3, e.g. the list of cities located in Ghana with a population over 25,000 ordered by

Re: [Wikidata-l] Data values

2012-12-20 Thread Daniel Kinzler
On 19.12.2012 16:57, Sven Manguard wrote: There is a balance. The more flexible the parameters, the easier it is to put data in, but the harder it is for computers to make useful connections with it. I'm not sure how to handle this, but I am sure that we can't just keep pretending that all

Re: [Wikidata-l] Data values

2012-12-20 Thread Daniel Kinzler
On 19.12.2012 18:13, Gregor Hagedorn wrote: On 19 December 2012 17:03, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: Indeed we do: https://wikidata.org/wiki/Wikidata:Glossary I use precision exactly like that: significant digits when rendering output or parsing intput. It can be used to

Re: [Wikidata-l] Data values

2012-12-20 Thread Daniel Kinzler
On 19.12.2012 18:00, Gregor Hagedorn wrote: Yes, Wikidata shall store a normalized version of the value, but it also needs to store an original one. Whether it needs to store the value twice I am not sure, I believe not. If it store the original prefix, original unit and original significant

Re: [Wikidata-l] Data values

2012-12-20 Thread Marco Fleckinger
On 2012-12-20 12:46, Daniel Kinzler wrote: So, please suggest terms to use for at least these two things: 1) value certainty (ideally, not using digits, but something that is independent of unit and rendering) We want to specify the limits of (possible) variation of a value, which would

Re: [Wikidata-l] Data values

2012-12-20 Thread jmcclure
(Proposal 3, modified) * value (xsd:double or xsd:decimal) * unit (a wikidata item) * totalDigits (xsd:smallint) * fractionDigits (xsd:smallint) * originalUnit (a wikidata item) * originalUnitPrefix (a wikidata item) JMc: I rearranged the list a bit and suggested simpler naming JMc: Is not

Re: [Wikidata-l] Data values

2012-12-20 Thread Denny Vrandečić
I am still trying to catch up with the whole discussion and to distill the results, both here and on the wiki. In the meanwhile, I have tried to create a prototype of how a complex model can still be entered in a simple fashion. A simple demo can be found here: http://simia.net/valueparser/ The

Re: [Wikidata-l] Data values

2012-12-20 Thread Sven Manguard
I really, really hope that this isn't the mindset of the development team as a whole. If so, my confidence in the viability of Wikidata would take a major hit. Yes, collecting the information that goes into infoboxes is going to be important, and yes, centralizing that information so that it can

Re: [Wikidata-l] Data values

2012-12-20 Thread Gregor Hagedorn
I believe there are a lot of dangerous assumptions on http://simia.net/valueparser/ First: there is no indication in a number that it is _not_ endlessly precise. Apostles = 12 has no uncertainty, representing it as 12 ± 1 is wrong, but also 12 ± 0.5 is wrong. The same applies to a number like

Re: [Wikidata-l] Data values

2012-12-20 Thread Avenue
Thanks, the prototype helps make some this more concrete. I am increasingly wondering if uncertainty will be overloaded here. People seem to want to use it for various types of measurement uncertainty (e.g. the standard error), ranges with no defined central value, and distributional summaries

Re: [Wikidata-l] Data values

2012-12-20 Thread Michael Smethurst
[wikidata-l-boun...@lists.wikimedia.org] on behalf of Avenue [avenu...@gmail.com] Sent: 20 December 2012 19:59 To: Discussion list for the Wikidata project. Subject: Re: [Wikidata-l] Data values Thanks, the prototype helps make some this more concrete. I am increasingly wondering if uncertainty

Re: [Wikidata-l] Data values

2012-12-19 Thread Friedrich Röhrs
I don't understand why 1.6e-8 is absolutly necessary for sorting and comparison. PHP allows for the definition of custom sorting functions. If a custom datatype is defined, a custom sorting/comparision function can be defined too. Or am i missing some performance points? On Wed, Dec 19, 2012 at

Re: [Wikidata-l] Data values

2012-12-19 Thread Daniel Kinzler
On 19.12.2012 11:56, Friedrich Röhrs wrote: I don't understand why 1.6e-8 is absolutly necessary for sorting and comparison. PHP allows for the definition of custom sorting functions. If a custom datatype is defined, a custom sorting/comparision function can be defined too. Or am i missing

Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
In addition to a storage option of the desired unit prefix (this may be considered a original-prefix, since naturally re-users may wish to reformat this). I see no point in storing the unit used for input. I think you plan to store the unit (which would be meter), so you don't want to store

Re: [Wikidata-l] Data values

2012-12-19 Thread Martynas Jusevičius
Hey wikidatians, occasionally checking threads in this list like the current one, I get a mixed feeling: on one hand, it is sad to see the efforts and resources waisted as Wikidata tries to reinvent RDF, and now also triplestore design as well as XSD datatypes. What's next, WikiQL instead of

Re: [Wikidata-l] Data values

2012-12-19 Thread Marco Fleckinger
On 2012-12-19 15:11, Daniel Kinzler wrote: On 19.12.2012 14:34, Friedrich Röhrs wrote: Hi, Sorry for my ignorance, if this is common knowledge: What is the use case for sorting millions of different measures from different objects? Finding all cities with more than 10 inhabitants

Re: [Wikidata-l] Data values

2012-12-19 Thread Nikola Smolenski
On 19/12/12 15:33, Nikola Smolenski wrote: On 19/12/12 12:23, Daniel Kinzler wrote: I don't think we can sensibly support historical units with unknown conversions, because they cannot be compared directly to SI units. So, they couldn't be used to answer queries, can't be converted for display,

Re: [Wikidata-l] Data values

2012-12-19 Thread Avenue
On Wed, Dec 19, 2012 at 2:32 PM, Marco Fleckinger marco.fleckin...@wikipedia.at wrote: IMHO this should be part of a model. E.g. Altitudes are usually measured in metres or feet, never in km or yards. Distances have the same SI base unit but are measured also measured in km, depending of the

Re: [Wikidata-l] Data values

2012-12-19 Thread Daniel Kinzler
On 19.12.2012 15:32, Marco Fleckinger wrote: Maybe we should make a difference between internal usage and visualization. Comparing meters with kilometers and feet is quite difficult, transcaling everything on visualization not. Not maybe. Definitely. Visualization is based on user preference,

Re: [Wikidata-l] Data values

2012-12-19 Thread Daniel Kinzler
On 19.12.2012 15:26, Avenue wrote: What about the North and South Poles? I'm sure standard coordinate systems have a convention for representing them. Won't we need lots of units that are not SI units (e.g. base pairs, IQ points, Scoville heat units, $ and €) and can't readily be translated

Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
On 19 December 2012 15:11, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: If they measure the same dimension, they should be saved using the same unit (probably the SI base unit for that dimension). Saving values using different units would make it impossible to run efficient queries against

Re: [Wikidata-l] Data values

2012-12-19 Thread Sven Manguard
I don't think we can sensibly support historical units with unknown conversions, because they cannot be compared directly to SI units. So, they couldn't be used to answer queries, can't be converted for display, etc - they arn't units in any sense the software can understand. This is a

Re: [Wikidata-l] Data values

2012-12-19 Thread Marco Fleckinger
On 2012-12-19 16:56, Daniel Kinzler wrote: On 19.12.2012 16:47, Gregor Hagedorn wrote: Daniel confirms (in separate mail) that Wikidata indeed intends to convert any derived SI units to a common formula of base units. Example: a quantity like 1013 hektopascal, the common unit for

Re: [Wikidata-l] Data values

2012-12-19 Thread Denny Vrandečić
Martynas, could you please let me know where RDF or any of the W3C standards covers topics like units, uncertainty, and their conversion. I would be very much interested in that. Cheers, Denny 2012/12/19 Martynas Jusevičius marty...@graphity.org Hey wikidatians, occasionally checking

Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
These all pose the same problems, correct. At the moment, I'm very unsure about how to accommodate these at all. Maybe we can have them as custom units, which are fixed for a given property, and can not be converted. I think the proposal to use wikidata items for the units (that is both

Re: [Wikidata-l] Data values

2012-12-19 Thread Daniel Kinzler
On 19.12.2012 16:41, Marco Fleckinger wrote: I assume there's a table for usual units for different purposes. E.g. altitudes are displayed in m and ft. Out of that one of those is chosen by the user's locale setting. My locale-setting would be kind of metric system, therefore it will be

Re: [Wikidata-l] Data values

2012-12-19 Thread Friedrich Röhrs
When we speak about dimensions, we talk about properties right? So when I define the property height of a person as an entity, i would supply the SI unit (m) and the SI multiple (-2, cm) that it should be saved in (in the database). When someone then inputs the height in meters (e.g. 1.86m) it

Re: [Wikidata-l] Data values

2012-12-19 Thread Herman Bruyninckx
On Wed, 19 Dec 2012, Denny Vrandečić wrote: Martynas, could you please let me know where RDF or any of the W3C standards covers topics like units, uncertainty, and their conversion. I would be very much interested in that. NIST has created a standard in OWL: QUDT - Quantities, Units,

Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
it is probably necessary to store the number of significant decimals. Yes, that *is* the accuracy value i mean. Daniel, please use correct terms. Accuracy is a defined concept and although by convention it may be roughly expressed by using the number of significant figures, that is not the

Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
On 19 December 2012 17:03, Daniel Kinzler daniel.kinz...@wikimedia.de wrote: I'd have thought that we'd have one such table per dimension (such as length or weight). It may make sense to override that on a per-property basis, so 2300m elevation isn't shown as 2.3km. Or that can be done in the

Re: [Wikidata-l] Data values

2012-12-19 Thread Martynas Jusevičius
Denny, you're sidestepping the main issue here -- every sensible architecture should build on as much previous standards as possible, and build own custom solution only if a *very* compelling reason is found to do so instead of finding a compromise between the requirements and the standard.

Re: [Wikidata-l] Data values

2012-12-19 Thread Sven Manguard
My philosophy is this: We should do whatever works best for Wikidata and Wikidata's needs. If people want to reuse our content, and the choices we've made make existing tools unworkable, they can build new tools themselves. We should not be clinging to what's been done already if it gets in the

Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
Martynas, I think you misinterpret the thread. There is no discussion not to build on the datatypes defined in http://www.w3.org/TR/xmlschema-2/ What we are doing is discussing compositions of elements, all typed to xml datatypes, that shall be able to express scientific and engineering

Re: [Wikidata-l] Data values

2012-12-19 Thread jmcclure
I suspect what Martynas is driving at is that XMLS defines **FACETS** for its datatypes - accepting those as a baseline, and then extending them to your requirements, is a reasonable, community-oriented procss. However, wrapping oneself in the flag of open development is to me unresponsive to a

Re: [Wikidata-l] Data values

2012-12-19 Thread Tom Morris
Wow, what a long thread. I was just about to chime in to agree with Sven's point about units when he interjected his comment about blithely ignoring history, so I feel compelled to comment on that first. It's fine to ignore standards *for good reasons*, but doing it out of ignorance or

Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
On 19 December 2012 20:01, jmccl...@hypergrove.com wrote: Hi Gregor - the root of the misconception I likely have about significant digits and the like, is that such is one example of a rendering parameter not a semantic property. It is about semantics, not formatting. In science and

Re: [Wikidata-l] Data values

2012-12-19 Thread jmcclure
totally agree - hopefully XSD facets provide a solid start to meeting those concrete requrements - thanks. On 19.12.2012 14:09, Gregor Hagedorn wrote: On 19 December 2012 20:01, jmccl...@hypergrove.com wrote: Hi Gregor - the root of the misconception I likely have about significant

Re: [Wikidata-l] Data values

2012-12-19 Thread jmcclure
For me the question is how to name the precision information. Do not the XSD facets totalDigits and fractionDigits work well enough? I mean .number:totalDigits contains a positive power of ten for precision .number:fractionDigits contains a negative power of ten for precision The use of

Re: [Wikidata-l] Data values

2012-12-19 Thread Sven Manguard
I think that Tom Morris tragically misunderstood my point, although that was likely helped by the fact that, as I've insinuated already, the many standards and acronyms being thrown about are largely lost on me. My point is not We can just throw everything out because we're big and awesome and

Re: [Wikidata-l] Data values

2012-12-19 Thread Peter Jacobi
If one has time to read prior art, I'd suggest giving the Health Level 7 v3.0 Data Types Specification http://amisha.pragmaticdata.com/v3dt/report.html a look. Of course HL7 has a lot of things to worry about which are off topic for us, starting with a prior completely different version of the

Re: [Wikidata-l] Data values

2012-12-18 Thread Denny Vrandečić
Thanks for the input so far. Here are a few explicit questions that I have: * Time: right now the data model assumes that the precision is given on the level decade / year / month etc., which means you can enter a date of birth like 1435 or May 1918. But is this sufficient? We cannot enter a

Re: [Wikidata-l] Data values

2012-12-18 Thread Marco Fleckinger
Hello, On 2012-12-18 15:29, Denny Vrandečić wrote: Thanks for the input so far. Here are a few explicit questions that I have: * Time: right now the data model assumes that the precision is given on the level decade / year / month etc., which means you can enter a date of birth like 1435 or

Re: [Wikidata-l] Data values

2012-12-18 Thread Friedrich Röhrs
Hi, * Time: Would it make sense to use time periods instead of partial datetimes with lower precision levels? Instead of using May 1918 as birth date it would be something like birth date in the interval 01.05.1918 - 31.05.1918. This does not necessarly need to be reflected in the UI of course,

Re: [Wikidata-l] Data values

2012-12-18 Thread Sven Manguard
Thanks for this Denny. Time: Historians **need** to be able to have date ranges of some sort. They also need to express confidence in non-numerical terms. Take for example, the invention of gunpowder in China. Not only do several major historians have different ranges entirely (which would, of

Re: [Wikidata-l] Data values

2012-12-18 Thread Denny Vrandečić
Thank you for your comments, Marco. 2012/12/18 Marco Fleckinger marco.fleckin...@wikipedia.at On 2012-12-18 15:29, Denny Vrandečić wrote: * Time: right now the data model assumes that the precision is given on the level decade / year / month etc., which means you can enter a date of birth

Re: [Wikidata-l] Data values

2012-12-18 Thread Denny Vrandečić
Thank you for your comments, Friedrich. It would be possible and very flexible, and certainly more powerful than the current system. But we would loose the convenience of having one date, which we need for query answering (or we could default to the lower or upper bound, or the middle, but all of

Re: [Wikidata-l] Data values

2012-12-18 Thread Sven Manguard
How about this: - Values default to a non-range value - You can click a checkbox that says range to turn the input into a range value instead - An entry can only be represented by either a non-range or a range number, not both This relieves our issue with query answering: Query: When was XXX

Re: [Wikidata-l] Data values

2012-12-18 Thread Friedrich Röhrs
Denny, could you maybe elaborate on what you mean by query answering? Do you talk about some technical aspect of the wiki-software? thanks, On Tue, Dec 18, 2012 at 5:08 PM, Sven Manguard svenmangu...@gmail.comwrote: How about this: - Values default to a non-range value - You can click a

Re: [Wikidata-l] Data values

2012-12-18 Thread Marco Fleckinger
On 2012-12-18 16:52, Denny Vrandečić wrote: Thank you for your comments, Marco. NP 2012/12/18 Marco Fleckinger marco.fleckin...@wikipedia.at mailto:marco.fleckin...@wikipedia.at On 2012-12-18 15:29, Denny Vrandečić wrote: * Time: right now the data model assumes that the

Re: [Wikidata-l] Data values

2012-12-18 Thread Gregor Hagedorn
It would be possible and very flexible, and certainly more powerful than the current system. But we would loose the convenience of having one date, which we need for query answering (or we could default to the lower or upper bound, or the middle, but all of these are a bit arbitrary). I

Re: [Wikidata-l] Data values

2012-12-18 Thread Gregor Hagedorn
Now, I don't think we need or want ranges as a data type at all (better have separate properties for the beginning and end). I am afraid this will then put a heavy burden on users to enter, proofread, and output values. Data input becomes dispersed, because the value 18-25 cm length has to be

Re: [Wikidata-l] Data values

2012-12-18 Thread Sven Manguard
The great thing about MediaWiki is that we don't have to anticipate new features, we can build them in later when we discover that they're possible and that they're wanted. In fact, there's no requirement that the Wikidata developers are even the ones that do develop said hypothetical future

Re: [Wikidata-l] Data values

2012-12-18 Thread Marco Fleckinger
On 2012-12-18 17:49, Gregor Hagedorn wrote: IMHO it would make sense to use the [[International System of Units]] for internal storage. It is not consequently used in other realms, not even in the German spoken countries (PS vs. kW for cars). Maybe it would be possible to use small scripts

Re: [Wikidata-l] Data values

2012-12-18 Thread Daniel Kinzler
On 18.12.2012 17:57, Gregor Hagedorn wrote: Now, I don't think we need or want ranges as a data type at all (better have separate properties for the beginning and end). I am afraid this will then put a heavy burden on users to enter, proofread, and output values. Data input becomes

Re: [Wikidata-l] Data values

2012-12-18 Thread Daniel Kinzler
On 18.12.2012 17:52, Gregor Hagedorn wrote: It would be possible and very flexible, and certainly more powerful than the current system. But we would loose the convenience of having one date, which we need for query answering (or we could default to the lower or upper bound, or the middle, but

Re: [Wikidata-l] Data values

2012-12-18 Thread Gregor Hagedorn
I don't see this as a big overhead. It is more a problem for ordering, but internally, wikidata could store a midpoint value for intervals where no explicit central value is given, and use these for ordering purposes. Well, I would call that mid point simple the value, and the range would be

Re: [Wikidata-l] Data values

2012-12-18 Thread Nikola Smolenski
On 18/12/12 16:52, Denny Vrandečić wrote: Thank you for your comments, Marco. 2012/12/18 Marco Fleckinger marco.fleckin...@wikipedia.at mailto:marco.fleckin...@wikipedia.at IMHO it would be make sense to have something hybrid. The datatype for geolocation should accept something like a

Re: [Wikidata-l] Data values

2012-12-18 Thread Gregor Hagedorn
(ASIDE: Regarding presentation: it is not always algorthmically eay whether to present 0.01 m as 1 * 10e-14 or a 10 fm = 10 * 10-15. In a scientific context, only the SI steps should be used, in another context the closest decimal may be appropriate.) But floating point numbers