There's a new module under /Experimental - jena-text.
This is a possible replacement for LARQ (whether to call it "LARQ2" or
something else is for discussion).
== Example query
# text search on rdfs:label for occurrences of "word"
# then retrieve the actual value from the RDF data
PREFIX : <http://example/>
PREFIX text: <http://jena.apache.org/text#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
{ ?s text:query (rdfs:label 'word') ;
rdfs:label ?label
}
== Example Fuseki config -- see end of message.
* works in Fuseki, with assembler setup, without the need for additional
java code.
* tracks additions to the dataset
* works with Lucene4, and with Solr4 for sharing
the text index with non-SPARQL apps.
* incompatible with LARQ1 (and the property function is different).
* simpler and smaller index design
It's complete rewrite and uses some new machinery to track changes to a
dataset so the index is kept in step (if desired - there are different
usage patterns).
The core design is the the index is only an index. It answers text
searches with a list of URIs. Unlike LARQ1, there aren't multiple
modes, and the literal indexed is not stored in the index itself. Only
indexing information and the URI are in the index; if the app wants to
find the data that lead to an index hit
Currently, it does not expose the score - the real requirement for that
we found is to retain ordering in text search results: score is a
partial solution to that (two hits can have the same score). There is
an included patch from an earlier version checked into SVN. (An
alternative is to add an "row id" variable to the results.)
While it works, it is not ready yet:
* documentation in text-query.mdtext needs completing.
* not tested heavily at scale (sometime, a better bulk loader and
integration with TDB loader would be good - not a block on a first release).
* needs examples
* machinery for change tracking and graph views of datasets is general
purpose and needs to migrate to in the proper module.
* needs tidying up
Many thanks to Brian McBride (Epimorphics) who has contributed testing,
bug fixes and generally made it better.
Epimorphics has agreed to contribute this to Apache.
Andy
## Example of a TDB dataset and text index published using Fuseki
@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> .
[] rdf:type fuseki:Server ;
fuseki:services (
<#service_text_tdb>
) .
# TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset rdfs:subClassOf ja:RDFDataset .
#text:TextIndexSolr rdfs:subClassOf text:TextIndex .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
## ---------------------------------------------------------------
<#service_text_tdb> rdf:type fuseki:Service ;
rdfs:label "TDB/text service" ;
fuseki:name "ds" ;
fuseki:serviceQuery "query" ;
fuseki:serviceQuery "sparql" ;
fuseki:serviceUpdate "update" ;
fuseki:serviceUpload "upload" ;
fuseki:serviceReadGraphStore "get" ;
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:dataset <#text_dataset> ;
.
<#text_dataset> rdf:type text:TextDataset ;
text:dataset <#dataset> ;
##text:index <#indexSolr> ;
text:index <#indexLucene> ;
.
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location "DB" ;
tdb:unionDefaultGraph true ;
.
<#indexSolr> a text:TextIndexSolr ;
#text:server <http://localhost:8983/solr/COLLECTION> ;
text:server <embedded:SolrARQ> ;
text:entityMap <#entMap> ;
.
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene> ;
##text:directory "mem" ;
text:entityMap <#entMap> ;
.
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ; ## Must be defined in the text:map
text:map (
# rdfs:label
[ text:field "text" ; text:predicate rdfs:label ]
) .