Sorry Hugh, I was agreeing with you... For the first part of my message I was thinking about non-triple type RDF/XML where the data may be representing a person/entity with various expected attributes. I appreciate that triples can be written in RDF/XML too and so the language I used was not very clear.
Andy http://www.geog.leeds.ac.uk/people/a.turner/ -----Original Message----- From: Hugh Glaser [mailto:h...@ecs.soton.ac.uk] Sent: 04 June 2013 12:00 To: Andy Turner Cc: Jan Michelfeit; public-lod@w3.org community Subject: Re: Representing NULL in RDF Hi Andy, On 4 Jun 2013, at 11:24, Andy Turner <a.g.d.tur...@leeds.ac.uk> wrote: > Hi, > > You may not know a persons date/time of birth, but if they were born this can > be a property of their linked data. If you know a persons age to a given > level of accuracy at a specific time, then you can derive bounds for their > date/time of birth and provide a likelihood value for that date/time of > birth. Similarly, people once they die have a date/time of death. > > There is a way of querying linked data to select/filter all living people > that may be of a specific age range at a specific time. This can be done both > using null values or by implicit null of no property being specified. > > I think for entities which are defined as having specific attributes, then it > is better to have null values in the RDF/XML when these are unknown as I > think that makes the computation easier. In that case, is there not the problem I mentioned with peoples' ages? That is: So if I have "ex:paper1 ex:reviewedBy ex:Hugh", "ex:paper2 ex:reviewedBy ex:NULL", "ex:paper3 ex:reviewedBy ex:NULL" And I ask the RDF, what other papers were reviewed by the same person, i.e. SELECT ?paper WHERE { ex:paper2 ex:reviewedBy ?reviewer . ?paper ex:reviewedBy ?reviewer } I get all the papers that have no reviewer in the case ex:paper3 Not really what was expected. Cheers > > For triple store RDF then I think Hugh is right. > > Of course we have to deal with uncertainty and mess in data about people as > we often do not know things accurately for sure! Some times we might discover > conflict such that increases our uncertainty about specific attributes. > > Regards, > > Andy > > ________________________________________ > From: Hugh Glaser [h...@ecs.soton.ac.uk] > Sent: 04 June 2013 10:35 > To: Jan Michelfeit > Cc: <public-lod@w3.org> > Subject: Re: Representing NULL in RDF > > If there is a "*standard or generally accepted*" way of doing things, then, > as has been pointed out, it is to ignore it. > Or rather the norm is that NULL (and "unknown" and anything else like - I'll > use NULL for shorthand) that is ignored, and doesn't generate a triple. > In fact it is really important to do so, as NULL most often simply represents > that the value is not known, in my experience. > Making a triple in such situations is one of the RDF101 basic mistakes, as > I'm sure you know, since it causes all sorts of sensible queries to do very > strange things. > For example, if the field is a person's age, then it would mean that a simple > query asking for people of the same age as someone of unknown age would give > you all the other people whose ages were not known. > > If you are in a generic world where you cannot bring any extra information to > the table, then this is all you can do. > > Beyond that, I think that you have to ask exactly what is meant (as you do) > and then model it. > Basically, is there something that is being said by the NULL, and if so, how > should that be captured in RDF? > So your >> 4. The value is withheld, e.g., when the data consumer is not allowed to >> access it. > > should be a "visibility" or "privacy" triple. > I think this may be what you are doing in (3) below, but I have some concerns > about the way you do it there. > Similarly for others such as >> 2. The value is unknown, i.e., it should be there but we don't know it. > which is where you ask the question of whether you want to represent that > someone's age is actually missing, with a triple. > > You need to ask what the new property should be attached to. > It is an important question whether it should be "part of" the value itself. > So, for a "visibility" triple, it may be more that the subject of the row is > having the property withheld than the value is a nonVisibleValue. > It is the person's foaf:givenName that is not being recorded, not some > property of a field from a DB. > There are patterns in various domains that try to tackle these sorts of > problems - in programming languages it is similar to the problem of returning > an exception instead of a value, and things like Union types can get used. > But remember that you want things to be easy to query for the most basic > question, and it is likely that you want to simply have a triple that says > :foo foaf:givenName "Jan" > which is what a user expects. > That then allows > SELECT ?name WHERE { :foo foaf:givenName ?name } > In fact, if you have things like your :nullableValue construct, then you > can't use predicates such as foaf:givenName at all, since the domain/range > constraints are bad (I think). > > Of course you may well find that there is another field in the DB that > actually has the information already, and is being transformed into RDF as > well, in which case the NULL field can simply be discarded. > > I think for these two I would just leave them without a triple: >> 1. The value is not applicable, i.e. property p does not exist or does not >> make sense in the context. >> 3. The value doesn't exist, i.e. the property doesn't have a value (e.g. >> year of death for a person alive). > > I don't think I would go of into RDFS and OWL specifically to capture things > - it is likely that the DB is simply modelling things in an unclear way, and > the challenge of transforming to RDF is to work out what the fuzziness was > and shine a light on it. > Remember that the purpose of the whole exercise is to construct some RDF that > is easy to query - or at least I hope that is the purpose! > So not having triples for things that don't have values is good. > And having triples that give more information about things is also good, as > they are very easy to query. > In fact, using RDFS and OWL for what is likely to be simple stuff from a DB > is only likely to provide checking at assertion, and not add anything easy to > querying - and since you are transforming from a DB, it is likely that the > data you are transforming is well-formed. > > Finally, I know this generates controversy, but I would always avoid bnodes > if it is possible/sensible to do - generating a URI is not hard, and can be > useful in the long run. In your example, you could just as easily say "Use a > node to give more details about the questioned value." > > Sorry, I've gone on a bit, but I just went with the flow! > > Best > Hugh > > On 3 Jun 2013, at 22:39, Jan Michelfeit <michelfeit....@gmail.com> > wrote: > >> Hi, >> thank you all for your answers. >> >>> ... One "represents" a null by failing to include the relationship >>> ... RDF semantics make no assumptions about what the absence of a >>> proposition/statement means >> >> I agree. The question was actually about *distinguishing* between the >> mentioned cases. >> >>> From your suggestions and a quite comprehensive answer at SO [1], I see >>> these solutions: >> >> (1) Use ontology to specify proper constraints. This may be cardinality of >> the questioned property or, as suggested by Phillip, assertion "that >> anything with a year of death is necessarily a dead person". >> >> (2) Use an RDF container and possibly rdf:nil (thanks to Barry and Robert >> for his example) . >> >> (3) Use a blank node to give more details about the questioned value. Examle >> [2]: >> :foo :aProp [a :nullableValue; rdf:value "value"] ; >> :bProp [a :nullableValue; :reason :notAvailable ] >> >> Regards, >> Jan >> >> [1] http://stackoverflow.com/a/16889273/2032064 >> [2] http://stackoverflow.com/a/16898786/2032064