Hi,
Denny Vrandečić denny.vrande...@wikimedia.de schrieb:
* Wikidata has to balance ease of use and expressiveness of statements.
The
user interface should not get complicated to merely cover a few
exceptional
edge cases.
So your test UI, had it been a proposal or just anything to have to
Gregor Hagedorn g.m.haged...@gmail.com schrieb:
So, please suggest terms to use for at least these two things:
1) value certainty (ideally, not using digits, but something that
is
independent of unit and rendering)
Here we want to talk about something that the true value is with a
certain
On 20.12.2012 20:52, Gregor Hagedorn wrote:
I believe there are a lot of dangerous assumptions on
http://simia.net/valueparser/
First: there is no indication in a number that it is _not_ endlessly precise.
Apostles = 12
has no uncertainty, representing it as
12 ± 1 is wrong, but also 12
On 20.12.2012 20:31, Friedrich Röhrs wrote:
Hi,
tried to enter the height of the eiffel tower. 324 meters. It suggested 324m
+-100m.
That's strange. When I enter 324m, it correctly suggests 324m+/-1 for me.
-- daniel
--
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland -
On 20.12.2012 20:59, Avenue wrote:
Thanks, the prototype helps make some this more concrete.
I am increasingly wondering if uncertainty will be overloaded here. People
seem to want to use it for various types of measurement uncertainty (e.g. the
standard error), ranges with no defined
Hi,
Does for me too now. Maybe i played around with the autouncertainty
checkbox before trying 324 (probably had some 4 digit value before).
Although the Problem remains, the height of the Eifeltower is not 324 +-1
meter. It is given as 324meters without any further information. So it
should
Hm the second one is only relevant for output.
I think this is a fundamental misunderstanding: The original one is
not for output but is the primary value for interpretation, for
understanding whether a value in Wikidata is correct of fake, or a
software conversion error, or what. If I want to
I don't like significant digits because it depends on the writing system
(base
10). I'd much rather express this as absolute values.
Yes, I would like too. What I argue is that the problem is that you
simply in 99.9 % (not a researched of number of course) of cases
simply don't know more
Hi all,
wow! Thanks for all the input. I read it all through, and am trying to
digest it currently into a new draft of the data model for the discussed
data values. I will try to adress some questions here. Please be kind if I
refer the wrong person at one place or the other.
Whenever I refer to
(if i knew the private email for Denny, I'd send this there)
Martynas, there is no mention here of XSD etc. because it is not
relevant on this level of discussion. For exporting the data we will
obviously use XSD datatypes. This is so obvious that I didn't think it
needed to be explicitly
The xsd:minInclusive, xsd:maxInclusive, xsd:minExclusive and
xsd:maxExclusive facets are absolute expressions not relative +/-
expressions, in order to accommodate fast queries. These four facets
permit specification of ranges with an unspecified median and ranges
with a specified mode,
On 21 December 2012 19:36, jmccl...@hypergrove.com wrote:
The xsd:minInclusive, xsd:maxInclusive, xsd:minExclusive and
xsd:maxExclusive facets are absolute expressions not relative +/-
expressions, in order to accommodate fast queries. These four facets permit
specification of ranges with an
I detect a need to characterize the range expression - most
important of which is whether the range is complete, or whether it
excludes (equal) tails on each end. XSD presumes a complete range is
being specified, not a subset, is the issue you're raising?
Could an
additional facet for
Hi,
On Fri, Dec 21, 2012 at 6:14 PM, Denny Vrandečić
denny.vrande...@wikimedia.de wrote:
Friedrich, the term query answering simply means the ability to answer
queries against the database in Phase 3, e.g. the list of cities located in
Ghana with a population over 25,000 ordered by
On 19.12.2012 16:57, Sven Manguard wrote:
There is a balance. The more flexible the parameters, the easier it is to put
data in, but the harder it is for computers to make useful connections with
it.
I'm not sure how to handle this, but I am sure that we can't just keep
pretending that all
On 19.12.2012 18:13, Gregor Hagedorn wrote:
On 19 December 2012 17:03, Daniel Kinzler daniel.kinz...@wikimedia.de wrote:
Indeed we do: https://wikidata.org/wiki/Wikidata:Glossary
I use precision exactly like that: significant digits when rendering
output or
parsing intput. It can be used to
On 19.12.2012 18:00, Gregor Hagedorn wrote:
Yes, Wikidata shall store a normalized version of the value, but it
also needs to store an original one. Whether it needs to store the
value twice I am not sure, I believe not. If it store the original
prefix, original unit and original significant
On 2012-12-20 12:46, Daniel Kinzler wrote:
So, please suggest terms to use for at least these two things:
1) value certainty (ideally, not using digits, but something that is
independent of unit and rendering)
We want to specify the limits of (possible) variation of a value,
which would
(Proposal 3, modified)
* value (xsd:double or xsd:decimal)
* unit
(a wikidata item)
* totalDigits (xsd:smallint)
* fractionDigits
(xsd:smallint)
* originalUnit (a wikidata item)
* originalUnitPrefix (a
wikidata item)
JMc: I rearranged the list a bit and suggested simpler
naming
JMc: Is not
I am still trying to catch up with the whole discussion and to distill the
results, both here and on the wiki.
In the meanwhile, I have tried to create a prototype of how a complex model
can still be entered in a simple fashion. A simple demo can be found here:
http://simia.net/valueparser/
The
I really, really hope that this isn't the mindset of the development team
as a whole. If so, my confidence in the viability of Wikidata would take a
major hit.
Yes, collecting the information that goes into infoboxes is going to be
important, and yes, centralizing that information so that it can
I believe there are a lot of dangerous assumptions on
http://simia.net/valueparser/
First: there is no indication in a number that it is _not_ endlessly precise.
Apostles = 12
has no uncertainty, representing it as
12 ± 1 is wrong, but also 12 ± 0.5 is wrong.
The same applies to a number like
Thanks, the prototype helps make some this more concrete.
I am increasingly wondering if uncertainty will be overloaded here.
People seem to want to use it for various types of measurement uncertainty
(e.g. the standard error), ranges with no defined central value,
and distributional summaries
[wikidata-l-boun...@lists.wikimedia.org] on behalf of Avenue
[avenu...@gmail.com]
Sent: 20 December 2012 19:59
To: Discussion list for the Wikidata project.
Subject: Re: [Wikidata-l] Data values
Thanks, the prototype helps make some this more concrete.
I am increasingly wondering if uncertainty
I don't understand why 1.6e-8 is absolutly necessary for sorting and
comparison. PHP allows for the definition of custom sorting functions. If a
custom datatype is defined, a custom sorting/comparision function can be
defined too. Or am i missing some performance points?
On Wed, Dec 19, 2012 at
On 19.12.2012 11:56, Friedrich Röhrs wrote:
I don't understand why 1.6e-8 is absolutly necessary for sorting and
comparison.
PHP allows for the definition of custom sorting functions. If a custom
datatype
is defined, a custom sorting/comparision function can be defined too. Or am i
missing
In addition to a storage option of the desired unit prefix (this may
be considered a original-prefix, since naturally re-users may wish to
reformat this).
I see no point in storing the unit used for input.
I think you plan to store the unit (which would be meter), so you
don't want to store
Hey wikidatians,
occasionally checking threads in this list like the current one, I get
a mixed feeling: on one hand, it is sad to see the efforts and
resources waisted as Wikidata tries to reinvent RDF, and now also
triplestore design as well as XSD datatypes. What's next, WikiQL
instead of
On 2012-12-19 15:11, Daniel Kinzler wrote:
On 19.12.2012 14:34, Friedrich Röhrs wrote:
Hi,
Sorry for my ignorance, if this is common knowledge: What is the use case for
sorting millions of different measures from different objects?
Finding all cities with more than 10 inhabitants
On 19/12/12 15:33, Nikola Smolenski wrote:
On 19/12/12 12:23, Daniel Kinzler wrote:
I don't think we can sensibly support historical units with unknown
conversions,
because they cannot be compared directly to SI units. So, they
couldn't be used
to answer queries, can't be converted for display,
On Wed, Dec 19, 2012 at 2:32 PM, Marco Fleckinger
marco.fleckin...@wikipedia.at wrote:
IMHO this should be part of a model. E.g. Altitudes are usually measured
in metres or feet, never in km or yards. Distances have the same SI base
unit but are measured also measured in km, depending of the
On 19.12.2012 15:32, Marco Fleckinger wrote:
Maybe we should make a difference between internal usage and visualization.
Comparing meters with kilometers and feet is quite difficult, transcaling
everything on visualization not.
Not maybe. Definitely. Visualization is based on user preference,
On 19.12.2012 15:26, Avenue wrote:
What about the North and South Poles?
I'm sure standard coordinate systems have a convention for representing them.
Won't we need lots of units that are not SI units (e.g. base pairs, IQ points,
Scoville heat units, $ and €) and can't readily be translated
On 19 December 2012 15:11, Daniel Kinzler daniel.kinz...@wikimedia.de wrote:
If they measure the same dimension, they should be saved using the same unit
(probably the SI base unit for that dimension). Saving values using different
units would make it impossible to run efficient queries against
I don't think we can sensibly support historical units with unknown
conversions,
because they cannot be compared directly to SI units. So, they couldn't be
used
to answer queries, can't be converted for display, etc - they arn't units
in any
sense the software can understand. This is a
On 2012-12-19 16:56, Daniel Kinzler wrote:
On 19.12.2012 16:47, Gregor Hagedorn wrote:
Daniel confirms (in separate mail) that Wikidata indeed intends to
convert any derived SI units to a common formula of base units.
Example: a quantity like 1013 hektopascal, the common unit for
Martynas,
could you please let me know where RDF or any of the W3C standards covers
topics like units, uncertainty, and their conversion. I would be very much
interested in that.
Cheers,
Denny
2012/12/19 Martynas Jusevičius marty...@graphity.org
Hey wikidatians,
occasionally checking
These all pose the same problems, correct. At the moment, I'm very unsure
about
how to accommodate these at all. Maybe we can have them as custom units,
which
are fixed for a given property, and can not be converted.
I think the proposal to use wikidata items for the units (that is both
On 19.12.2012 16:41, Marco Fleckinger wrote:
I assume there's a table for usual units for different purposes. E.g.
altitudes
are displayed in m and ft. Out of that one of those is chosen by the user's
locale setting. My locale-setting would be kind of metric system, therefore
it
will be
When we speak about dimensions, we talk about properties right?
So when I define the property height of a person as an entity, i would
supply the SI unit (m) and the SI multiple (-2, cm) that it should be saved
in (in the database).
When someone then inputs the height in meters (e.g. 1.86m) it
On Wed, 19 Dec 2012, Denny Vrandečić wrote:
Martynas,
could you please let me know where RDF or any of the W3C standards covers
topics like units,
uncertainty, and their conversion. I would be very much interested in that.
NIST has created a standard in OWL: QUDT - Quantities, Units,
it is probably necessary to store the number of
significant decimals.
Yes, that *is* the accuracy value i mean.
Daniel, please use correct terms. Accuracy is a defined concept and
although by convention it may be roughly expressed by using the number
of significant figures, that is not the
On 19 December 2012 17:03, Daniel Kinzler daniel.kinz...@wikimedia.de wrote:
I'd have thought that we'd have one such table per dimension (such as length
or weight). It may make sense to override that on a per-property basis, so
2300m elevation isn't shown as 2.3km. Or that can be done in the
Denny,
you're sidestepping the main issue here -- every sensible architecture
should build on as much previous standards as possible, and build own
custom solution only if a *very* compelling reason is found to do so
instead of finding a compromise between the requirements and the
standard.
My philosophy is this: We should do whatever works best for Wikidata and
Wikidata's needs. If people want to reuse our content, and the choices
we've made make existing tools unworkable, they can build new tools
themselves. We should not be clinging to what's been done already if it
gets in the
Martynas,
I think you misinterpret the thread. There is no discussion not to
build on the datatypes defined in http://www.w3.org/TR/xmlschema-2/
What we are doing is discussing compositions of elements, all typed to
xml datatypes, that shall be able to express scientific and
engineering
I suspect what Martynas is driving at is that XMLS defines
**FACETS** for its datatypes - accepting those as a baseline, and then
extending them to your requirements, is a reasonable, community-oriented
procss. However, wrapping oneself in the flag of open development is
to me unresponsive to a
Wow, what a long thread. I was just about to chime in to agree with Sven's
point about units when he interjected his comment about blithely ignoring
history, so I feel compelled to comment on that first. It's fine to ignore
standards *for good reasons*, but doing it out of ignorance or
On 19 December 2012 20:01, jmccl...@hypergrove.com wrote:
Hi Gregor - the root of the misconception I likely have about significant
digits and the like, is that such is one example of a rendering parameter
not a semantic property.
It is about semantics, not formatting.
In science and
totally agree - hopefully XSD facets provide a solid start to
meeting those concrete requrements - thanks.
On 19.12.2012 14:09,
Gregor Hagedorn wrote:
On 19 December 2012 20:01,
jmccl...@hypergrove.com wrote:
Hi Gregor - the root of the
misconception I likely have about significant
For me the question is how to name the precision information. Do not
the XSD facets totalDigits and fractionDigits work well enough? I
mean
.number:totalDigits contains a positive power of ten for
precision
.number:fractionDigits contains a negative power of ten for
precision
The use of
I think that Tom Morris tragically misunderstood my point, although that
was likely helped by the fact that, as I've insinuated already, the many
standards and acronyms being thrown about are largely lost on me.
My point is not We can just throw everything out because we're big and
awesome and
If one has time to read prior art, I'd suggest giving the Health Level
7 v3.0 Data Types Specification
http://amisha.pragmaticdata.com/v3dt/report.html a look.
Of course HL7 has a lot of things to worry about which are off topic
for us, starting with a prior completely different version of the
Thanks for the input so far. Here are a few explicit questions that I have:
* Time: right now the data model assumes that the precision is given on the
level decade / year / month etc., which means you can enter a date of
birth like 1435 or May 1918. But is this sufficient? We cannot enter a
Hello,
On 2012-12-18 15:29, Denny Vrandečić wrote:
Thanks for the input so far. Here are a few explicit questions that I have:
* Time: right now the data model assumes that the precision is given on
the level decade / year / month etc., which means you can enter a date
of birth like 1435 or
Hi,
* Time:
Would it make sense to use time periods instead of partial datetimes with
lower precision levels? Instead of using May 1918 as birth date it would be
something like birth date in the interval 01.05.1918 - 31.05.1918. This
does not necessarly need to be reflected in the UI of course,
Thanks for this Denny.
Time:
Historians **need** to be able to have date ranges of some sort. They also
need to express confidence in non-numerical terms. Take for example, the
invention of gunpowder in China. Not only do several major historians have
different ranges entirely (which would, of
Thank you for your comments, Marco.
2012/12/18 Marco Fleckinger marco.fleckin...@wikipedia.at
On 2012-12-18 15:29, Denny Vrandečić wrote:
* Time: right now the data model assumes that the precision is given on
the level decade / year / month etc., which means you can enter a date
of birth
Thank you for your comments, Friedrich.
It would be possible and very flexible, and certainly more powerful than
the current system. But we would loose the convenience of having one date,
which we need for query answering (or we could default to the lower or
upper bound, or the middle, but all of
How about this:
- Values default to a non-range value
- You can click a checkbox that says range to turn the input into a range
value instead
- An entry can only be represented by either a non-range or a range number,
not both
This relieves our issue with query answering:
Query: When was XXX
Denny,
could you maybe elaborate on what you mean by query answering? Do you talk
about some technical aspect of the wiki-software?
thanks,
On Tue, Dec 18, 2012 at 5:08 PM, Sven Manguard svenmangu...@gmail.comwrote:
How about this:
- Values default to a non-range value
- You can click a
On 2012-12-18 16:52, Denny Vrandečić wrote:
Thank you for your comments, Marco.
NP
2012/12/18 Marco Fleckinger marco.fleckin...@wikipedia.at
mailto:marco.fleckin...@wikipedia.at
On 2012-12-18 15:29, Denny Vrandečić wrote:
* Time: right now the data model assumes that the
It would be possible and very flexible, and certainly more powerful than the
current system. But we would loose the convenience of having one date, which
we need for query answering (or we could default to the lower or upper
bound, or the middle, but all of these are a bit arbitrary).
I
Now, I don't think we need or want ranges as a data type at all (better have
separate properties for the beginning and end).
I am afraid this will then put a heavy burden on users to enter,
proofread, and output values. Data input becomes dispersed, because
the value 18-25 cm length has to be
The great thing about MediaWiki is that we don't have to anticipate new
features, we can build them in later when we discover that they're possible
and that they're wanted. In fact, there's no requirement that the Wikidata
developers are even the ones that do develop said hypothetical future
On 2012-12-18 17:49, Gregor Hagedorn wrote:
IMHO it would make sense to use the [[International System of Units]] for
internal storage. It is not consequently used in other realms, not even in
the German spoken countries (PS vs. kW for cars). Maybe it would be possible
to use small scripts
On 18.12.2012 17:57, Gregor Hagedorn wrote:
Now, I don't think we need or want ranges as a data type at all (better have
separate properties for the beginning and end).
I am afraid this will then put a heavy burden on users to enter,
proofread, and output values. Data input becomes
On 18.12.2012 17:52, Gregor Hagedorn wrote:
It would be possible and very flexible, and certainly more powerful than the
current system. But we would loose the convenience of having one date, which
we need for query answering (or we could default to the lower or upper
bound, or the middle, but
I don't see this as a big overhead. It is more a problem for ordering,
but internally, wikidata could store a midpoint value for intervals
where no explicit central value is given, and use these for ordering
purposes.
Well, I would call that mid point simple the value, and the range would be
On 18/12/12 16:52, Denny Vrandečić wrote:
Thank you for your comments, Marco.
2012/12/18 Marco Fleckinger marco.fleckin...@wikipedia.at
mailto:marco.fleckin...@wikipedia.at
IMHO it would be make sense to have something hybrid. The datatype
for geolocation should accept something like a
(ASIDE: Regarding presentation: it is not always algorthmically eay
whether to present 0.01 m as 1 * 10e-14 or a 10 fm = 10 *
10-15. In a scientific context, only the SI steps should be used, in
another context the closest decimal may be appropriate.)
But floating point numbers
71 matches
Mail list logo