Hi Claude,
The notion of Capabilities.handlesLiteralTyping isn't particularly
relevant to SPARQL which has both exact term matching (SameTerm) and
value matching (in FILTERS, =, <, >). It is not part of the RDF specs
either - it's a form of entailment.
Lexical-space, value-space is part of SPARQL. The per-graph idea of
literal typing has been overtaking by specs - it's a very old feature
from a time when RDF was just coming into being and RDF wasn't well
connected to the numerical work in XML-land.
handlesLiteralTyping is only called from test cases in the code base. I
don't remember seeing it in any support questions for many years.
We should arrange to evolve it away.
TIM is RDF Term based despite what the capabilities may say (JENA-1265).
The Capabilities.handlesLiteralTyping is not meaningful/helpful for TDB.
It is not quite the right concept.
On 10/12/16 22:21, Claude Warren wrote:
I have a reasonable first cut at the Cassandra graph. The Contract testing
is working (mostly). I specified the graph did literal typing but have not
implemented it. My question has to do with what literal types are to be
coerced?
From the tests I see that byte, short, int and long should all be the same
value in a query.
Which tests? ARQ, SPARQL ones? Graph ones?
The graph only tests are suspect as the different storages for graphs do
different things and always have done. The ideal of the duality that
mem-graphs tries to make of having a term based storage and a
value-based indexing does not work in persistent storage at scale. it
also conflicts with how the specs have evolved.
The responsibility for term/value comes from how the RDF term is used.
So the responsibility is in SPARQL.
xsd:byte, xsd:short, xsd:int xsd:long are all derived types of
xsd:decimal though F&O makes xsd:integer a bit special.
I don't see anything for float and double any pointers there?
xsd:floats are not a derived type of xsd:double. They are unrelated types.
There is also the concept of promotion : a numeric type can be promoted
to another type for the purposes of a calculation.
1^^xsd:int + 2^^xsd:float is an xsd:float because 1 is prompted to float.
1^^xsd:double + 2^^xsd:float is an xsd:double because 2 is prompted to
double.
The XML/XSD/F&O rules are quite log but really come down to what people
expect based on experiences with programming languages.
https://www.w3.org/TR/xpath-31/#promotion
Any gotcha's I should be aware of before I try to implement something?
RDF Term vs the value of a term.
TDB takes a more value oriented approach : literals are converted on
input to values.
"001"^^xsd:unsignedInteger is stored as 1 - an integer.
The details of exact lexical representation and datatype are lost.
It speeds number filtering up a great deal.
People are used to the lexical form changing : put 001 into a program
and get 1 out is to be expected.
The loss of datatype really hasn't lead to much reaction - I expected it
would do, as what you put in is not exacted what you get out.
To do both requires more storage and/or CPU cycles in the most inner
parts of data retrieval. Non-trivial.
The other issue seems to be the language tag on string literals. But I'll
get that worked out soon too.
Here, there is no right answer!
Users want to retain case, have case insensitive matching and also have
only one occurrence of the literal if there are two cases used. In
practice retaining case is import for users, and not worrying about
technical duplicates because it happens rarely.
For those really worried, check the data going in with SHACL.
Andy
Claude