Hi Claude,

The notion of Capabilities.handlesLiteralTyping isn't particularly relevant to SPARQL which has both exact term matching (SameTerm) and value matching (in FILTERS, =, <, >). It is not part of the RDF specs either - it's a form of entailment.

Lexical-space, value-space is part of SPARQL. The per-graph idea of literal typing has been overtaking by specs - it's a very old feature from a time when RDF was just coming into being and RDF wasn't well connected to the numerical work in XML-land.

handlesLiteralTyping is only called from test cases in the code base. I don't remember seeing it in any support questions for many years.

We should arrange to evolve it away.

TIM is RDF Term based despite what the capabilities may say (JENA-1265).

The Capabilities.handlesLiteralTyping is not meaningful/helpful for TDB. It is not quite the right concept.

On 10/12/16 22:21, Claude Warren wrote:
I have a reasonable first cut at the Cassandra graph.  The Contract testing
is working (mostly).  I specified the graph did literal typing but have not
implemented it.  My question has to do with what literal types are to be
coerced?

From the tests I see that byte, short, int and long should all be the same
value in a query.

Which tests?  ARQ, SPARQL ones? Graph ones?

The graph only tests are suspect as the different storages for graphs do different things and always have done. The ideal of the duality that mem-graphs tries to make of having a term based storage and a value-based indexing does not work in persistent storage at scale. it also conflicts with how the specs have evolved.

The responsibility for term/value comes from how the RDF term is used. So the responsibility is in SPARQL.

xsd:byte, xsd:short, xsd:int xsd:long are all derived types of xsd:decimal though F&O makes xsd:integer a bit special.

I don't see anything for float and double any pointers there?

xsd:floats are not a derived type of xsd:double.  They are unrelated types.

There is also the concept of promotion : a numeric type can be promoted to another type for the purposes of a calculation.

1^^xsd:int + 2^^xsd:float  is an xsd:float because 1 is prompted to float.

1^^xsd:double + 2^^xsd:float is an xsd:double because 2 is prompted to double.

The XML/XSD/F&O rules are quite log but really come down to what people expect based on experiences with programming languages.

https://www.w3.org/TR/xpath-31/#promotion

Any gotcha's I should be aware of before I try to implement something?

RDF Term vs the value of a term.


TDB takes a more value oriented approach : literals are converted on input to values.

"001"^^xsd:unsignedInteger is stored as 1 - an integer.

The details of exact lexical representation and datatype are lost.

It speeds number filtering up a great deal.

People are used to the lexical form changing : put 001 into a program and get 1 out is to be expected.

The loss of datatype really hasn't lead to much reaction - I expected it would do, as what you put in is not exacted what you get out.

To do both requires more storage and/or CPU cycles in the most inner parts of data retrieval. Non-trivial.

The other issue seems to be the language tag on string literals.  But I'll
get that worked out soon too.

Here, there is no right answer!

Users want to retain case, have case insensitive matching and also have only one occurrence of the literal if there are two cases used. In practice retaining case is import for users, and not worrying about technical duplicates because it happens rarely.

For those really worried, check the data going in with SHACL.

    Andy


Claude

Reply via email to