Re: literal typing in DB?

Andy Seaborne Sun, 11 Dec 2016 04:29:51 -0800

Hi Claude,

The notion of Capabilities.handlesLiteralTyping isn't particularlyrelevant to SPARQL which has both exact term matching (SameTerm) andvalue matching (in FILTERS, =, <, >). It is not part of the RDF specseither - it's a form of entailment.

Lexical-space, value-space is part of SPARQL. The per-graph idea ofliteral typing has been overtaking by specs - it's a very old featurefrom a time when RDF was just coming into being and RDF wasn't wellconnected to the numerical work in XML-land.

handlesLiteralTyping is only called from test cases in the code base. Idon't remember seeing it in any support questions for many years.


We should arrange to evolve it away.

TIM is RDF Term based despite what the capabilities may say (JENA-1265).

The Capabilities.handlesLiteralTyping is not meaningful/helpful for TDB.It is not quite the right concept.


On 10/12/16 22:21, Claude Warren wrote:

I have a reasonable first cut at the Cassandra graph.  The Contract testing
is working (mostly).  I specified the graph did literal typing but have not
implemented it.  My question has to do with what literal types are to be
coerced?

From the tests I see that byte, short, int and long should all be the same
value in a query.


Which tests?  ARQ, SPARQL ones? Graph ones?

The graph only tests are suspect as the different storages for graphs dodifferent things and always have done. The ideal of the duality thatmem-graphs tries to make of having a term based storage and avalue-based indexing does not work in persistent storage at scale. italso conflicts with how the specs have evolved.

The responsibility for term/value comes from how the RDF term is used.So the responsibility is in SPARQL.

xsd:byte, xsd:short, xsd:int xsd:long are all derived types ofxsd:decimal though F&O makes xsd:integer a bit special.

I don't see anything for float and double any pointers there?


xsd:floats are not a derived type of xsd:double.  They are unrelated types.

There is also the concept of promotion : a numeric type can be promotedto another type for the purposes of a calculation.


1^^xsd:int + 2^^xsd:float  is an xsd:float because 1 is prompted to float.

1^^xsd:double + 2^^xsd:float is an xsd:double because 2 is prompted todouble.

The XML/XSD/F&O rules are quite log but really come down to what peopleexpect based on experiences with programming languages.


https://www.w3.org/TR/xpath-31/#promotion

Any gotcha's I should be aware of before I try to implement something?


RDF Term vs the value of a term.

TDB takes a more value oriented approach : literals are converted oninput to values.


"001"^^xsd:unsignedInteger is stored as 1 - an integer.

The details of exact lexical representation and datatype are lost.

It speeds number filtering up a great deal.

People are used to the lexical form changing : put 001 into a programand get 1 out is to be expected.

The loss of datatype really hasn't lead to much reaction - I expected itwould do, as what you put in is not exacted what you get out.

To do both requires more storage and/or CPU cycles in the most innerparts of data retrieval. Non-trivial.

The other issue seems to be the language tag on string literals.  But I'll
get that worked out soon too.


Here, there is no right answer!

Users want to retain case, have case insensitive matching and also haveonly one occurrence of the literal if there are two cases used. Inpractice retaining case is import for users, and not worrying abouttechnical duplicates because it happens rarely.


For those really worried, check the data going in with SHACL.

    Andy


Claude

Re: literal typing in DB?

Reply via email to