RE: Representing NULL in RDF

Andy Turner Tue, 04 Jun 2013 05:55:15 -0700

Sorry Hugh, I was agreeing with you... For the first part of my message I was 
thinking about non-triple type RDF/XML where the data may be representing a 
person/entity with various expected attributes. I appreciate that triples can 
be written in RDF/XML too and so the language I used was not very clear.


Andy
http://www.geog.leeds.ac.uk/people/a.turner/
 
-----Original Message-----
From: Hugh Glaser [mailto:h...@ecs.soton.ac.uk] 
Sent: 04 June 2013 12:00
To: Andy Turner
Cc: Jan Michelfeit; public-lod@w3.org community
Subject: Re: Representing NULL in RDF

Hi Andy,
On 4 Jun 2013, at 11:24, Andy Turner <a.g.d.tur...@leeds.ac.uk>
 wrote:

> Hi,
> 
> You may not know a persons date/time of birth, but if they were born this can 
> be a property of their linked data. If you know a persons age to a given 
> level of accuracy at a specific time, then you can derive bounds for their 
> date/time of birth and provide a likelihood value for that date/time of 
> birth. Similarly, people once they die have a date/time of death. 
> 
> There is a way of querying linked data to select/filter all living people 
> that may be of a specific age range at a specific time. This can be done both 
> using null values or by implicit null of no property being specified.
> 
> I think for entities which are defined as having specific attributes, then it 
> is better to have null values in the RDF/XML when these are unknown as I 
> think that makes the computation easier.
In that case, is there not the problem I mentioned with peoples' ages?
That is:
So if I have "ex:paper1 ex:reviewedBy ex:Hugh", "ex:paper2 ex:reviewedBy 
ex:NULL", "ex:paper3 ex:reviewedBy ex:NULL"
And I ask the RDF, what other papers were reviewed by the same person, i.e. 
SELECT ?paper WHERE { ex:paper2 ex:reviewedBy ?reviewer . ?paper ex:reviewedBy 
?reviewer }
I get all the papers that have no reviewer in the case ex:paper3

Not really what was expected.
Cheers
> 
> For triple store RDF then I think Hugh is right.
> 
> Of course we have to deal with uncertainty and mess in data about people as 
> we often do not know things accurately for sure! Some times we might discover 
> conflict such that increases our uncertainty about specific attributes.
> 
> Regards,
> 
> Andy
> 
> ________________________________________
> From: Hugh Glaser [h...@ecs.soton.ac.uk]
> Sent: 04 June 2013 10:35
> To: Jan Michelfeit
> Cc: <public-lod@w3.org>
> Subject: Re: Representing NULL in RDF
> 
> If there is a "*standard or generally accepted*" way of doing things, then, 
> as has been pointed out, it is to ignore it.
> Or rather the norm is that NULL (and "unknown" and anything else like - I'll 
> use NULL for shorthand) that is ignored, and doesn't generate a triple.
> In fact it is really important to do so, as NULL most often simply represents 
> that the value is not known, in my experience.
> Making a triple in such situations is one of the RDF101 basic mistakes, as 
> I'm sure you know, since it causes all sorts of sensible queries to do very 
> strange things.
> For example, if the field is a person's age, then it would mean that a simple 
> query asking for people of the same age as someone of unknown age would give 
> you all the other people whose ages were not known.
> 
> If you are in a generic world where you cannot bring any extra information to 
> the table, then this is all you can do.
> 
> Beyond that, I think that you have to ask exactly what is meant (as you do) 
> and then model it.
> Basically, is there something that is being said by the NULL, and if so, how 
> should that be captured in RDF?
> So your
>> 4. The value is withheld, e.g., when the data consumer is not allowed to 
>> access it.
> 
> should be a "visibility" or "privacy" triple.
> I think this may be what you are doing in (3) below, but I have some concerns 
> about the way you do it there.
> Similarly for others such as
>> 2. The value is unknown, i.e., it should be there but we don't know it.
> which is where you ask the question of whether you want to represent that 
> someone's age is actually missing, with a triple.
> 
> You need to ask what the new property should be attached to.
> It is an important question whether it should be "part of" the value itself.
> So, for a "visibility" triple, it may be more that the subject of the row is 
> having the property withheld than the value is a nonVisibleValue.
> It is the person's foaf:givenName that is not being recorded, not some 
> property of a field from a DB.
> There are patterns in various domains that try to tackle these sorts of 
> problems - in programming languages it is similar to the problem of returning 
> an exception instead of a value, and things like Union types can get used.
> But remember that you want things to be easy to query for the most basic 
> question, and it is likely that you want to simply have a triple that says
> :foo foaf:givenName "Jan"
> which is what a user expects.
> That then allows
> SELECT ?name WHERE { :foo foaf:givenName ?name }
> In fact, if you have things like your :nullableValue construct, then you 
> can't use predicates such as foaf:givenName at all, since the domain/range 
> constraints are bad (I think).
> 
> Of course you may well find that there is another field in the DB that 
> actually has the information already, and is being transformed into RDF as 
> well, in which case the NULL field can simply be discarded.
> 
> I think for these two I would just leave them without a triple:
>> 1. The value is not applicable, i.e. property p does not exist or does not 
>> make sense in the context.
>> 3. The value doesn't exist, i.e. the property doesn't have a value (e.g. 
>> year of death for a person alive).
> 
> I don't think I would go of into RDFS and OWL specifically to capture things 
> - it is likely that the DB is simply modelling things in an unclear way, and 
> the challenge of transforming to RDF is to work out what the fuzziness was 
> and shine a light on it.

> Remember that the purpose of the whole exercise is to construct some RDF that 
> is easy to query - or at least I hope that is the purpose!
> So not having triples for things that don't have values is good.
> And having triples that give more information about things is also good, as 
> they are very easy to query.
> In fact, using RDFS and OWL for what is likely to be simple stuff from a DB 
> is only likely to provide checking at assertion, and not add anything easy to 
> querying - and since you are transforming from a DB, it is likely that the 
> data you are transforming is well-formed.
> 
> Finally, I know this generates controversy, but I would always avoid bnodes 
> if it is possible/sensible to do - generating a URI is not hard, and can be 
> useful in the long run. In your example, you could just as easily say "Use a 
> node to give more details about the questioned value."
> 
> Sorry, I've gone on a bit, but I just went with the flow!
> 
> Best
> Hugh
> 
> On 3 Jun 2013, at 22:39, Jan Michelfeit <michelfeit....@gmail.com>
> wrote:
> 
>> Hi,
>> thank you all for your answers.
>> 
>>> ... One "represents" a null by failing to include the relationship
>>> ... RDF semantics make no assumptions about what the absence of a 
>>> proposition/statement means
>> 
>> I agree. The question was actually about *distinguishing* between the 
>> mentioned cases.
>> 
>>> From your suggestions and a quite comprehensive answer at SO [1], I see 
>>> these solutions:
>> 
>> (1) Use ontology to specify proper constraints. This may be cardinality of 
>> the questioned property or, as suggested by Phillip, assertion "that 
>> anything with a year of death is necessarily a dead person".
>> 
>> (2) Use an RDF container and possibly rdf:nil (thanks to Barry and Robert 
>> for his example) .
>> 
>> (3) Use a blank node to give more details about the questioned value. Examle 
>> [2]:
>>  :foo :aProp [a :nullableValue; rdf:value "value"] ;
>>       :bProp [a :nullableValue; :reason :notAvailable ]
>> 
>> Regards,
>> Jan
>> 
>> [1] http://stackoverflow.com/a/16889273/2032064
>> [2] http://stackoverflow.com/a/16898786/2032064

RE: Representing NULL in RDF

Reply via email to