On 28/01/11 15:50, Benson Margulies wrote:
At the day job, one of our lead technologies is a device that can
decide that 'Barak Obama' and 'Barack Obama' are probably the same
thing, or even that 歐巴馬 is another spelling. Is there an extension
model for SPARQL queries? In this case, it wouldn't really work to
just live in the FILTER, since the fundamental selection would be
something like:
?s something:hasName "Barak Obama"
and we want to tamper with how the literal string gets compared. We
have one API that says "how similar are these strings" and another
more complex model in which we build an index that rapidly returns all
the strings that are within some distance of a query. We could, of
course, build our own index by mining TDB, make our own query, and
then get busy SPARQL-ing starting from a set of URI's thus derived,
but I just wondered about a more integrated approach.
Benson,
ARQ provides "property functions" where a property is matched by calling
custom code, not the storage-level matching
http://openjena.org/ARQ/extension.html#propertyFunctions
One example is free-text matching, using Lucene:
http://openjena.org/ARQ/lucene-arq.html
A property function can provide the access to another index such as your
example of similar literals. You could either index literal to literal
by similarity or literal to resource it relates to. The similarity can
return multiple possible matches (one of the reasons for extending via
properties is that it gives a framework multiple matches unlike FILTERs).
(Property functions do not work in all property paths situations
currently - not clear what it means in {0} and *, nor the interaction
with the backtracking search)
Andy