Recently Vincent offered a nice patch to our text indexing documentation, as shown below. Oddly, when I now go to merge it (a bit late, sorry!), I get an error: "Can't locate anonymous's tree to clone". Is anyone familiar with that? I know very little about the SVN-based CMS, so I'm not even sure where to start looking...
ajs6f > On Jan 23, 2019, at 12:01 PM, vincent.ventres...@ens-lyon.fr > <anonym...@apache.org> wrote: > > Clone URL (Committers only): > https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Fquery%2Ftext-query.mdtext > > vincent.ventres...@ens-lyon.fr > > Index: trunk/content/documentation/query/text-query.mdtext > =================================================================== > --- trunk/content/documentation/query/text-query.mdtext (revision > 1851871) > +++ trunk/content/documentation/query/text-query.mdtext (working copy) > @@ -609,21 +609,47 @@ > index field. More complex setups, with multiple properties per entity > (URI) are possible. > > +The assembler file can be either default configuration file > (.../run/config.ttl) > +or a custom file in ...run/configuration folder. Note that you can use > several files > +simultaneously. > + > +You have to edit the file (see comments in the assembler code below): > + > +1. provide values for paths and a fixed URI for tdb:DatasetTDB > +2. modify the entity map : add the fields you want to index and desired > options (filters, tokenizers...) > + > +If your assembler file is run/config.ttl, you can index the dataset with > this command : > + > +java -cp ./fuseki-server.jar jena.textindexer --desc=run/config.ttl > + > Once configured, any data added to the text dataset is automatically > -indexed as well. > +indexed as well : > https://jena.apache.org/documentation/query/text-query.html#building-a-text-index > > +When you change the jena-text in significant ways, such as changing what > analyzer > +is used for a given property and so on, then you’ll need to rebuild the > Lucene index > +via reloading the dataset or using the textIndexer. > + > ### Text Dataset Assembler > > The following is an example of a TDB dataset with a text index. > > + ######## Example of a TDB dataset and text index######################### > + # The main doc sources are: > + # - > https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html > + # - https://jena.apache.org/documentation/assembler/assembler-howto.html > + # - https://jena.apache.org/documentation/assembler/assembler.ttl > + # See https://jena.apache.org/documentation/fuseki2/fuseki-layout.html > for the destination of this file. > + ######################################################################### > + > @prefix : <http://localhost/jena_example/#> . > @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . > @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . > @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> . > @prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> . > @prefix text: <http://jena.apache.org/text#> . > + @prefix skos: <http://www.w3.org/2004/02/skos/core#> > + @prefix fuseki: <http://jena.apache.org/fuseki#> . > > - ## Example of a TDB dataset and text index > ## Initialize TDB > [] ja:loadClass "org.apache.jena.tdb.TDB" . > tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset . > @@ -631,39 +657,64 @@ > > ## Initialize text query > [] ja:loadClass "org.apache.jena.query.text.TextQuery" . > + > # A TextDataset is a regular dataset with a text index. > text:TextDataset rdfs:subClassOf ja:RDFDataset . > + > # Lucene index > text:TextIndexLucene rdfs:subClassOf text:TextIndex . > - # Elasticsearch index > - text:TextIndexES rdfs:subClassOf text:TextIndex . > > + > ## --------------------------------------------------------------- > - ## This URI must be fixed - it's used to assemble the text dataset. > > :text_dataset rdf:type text:TextDataset ; > - text:dataset <#dataset> ; > + text:dataset :my_dataset ; # <-- > replace `:my_dataset` with the desired URI > text:index <#indexLucene> ; > - . > + . > > # A TDB dataset used for RDF storage > - <#dataset> rdf:type tdb:DatasetTDB ; > - tdb:location "DB" ; > - tdb:unionDefaultGraph true ; # Optional > - . > > - # Text index description > + :my_dataset rdf:type tdb:DatasetTDB ; # <-- > replace `:my_dataset` with the desired URI > + tdb:location "/tmp/tdb-dataset/" ; # <-- > replace `/tmp/tdb-dataset/` with your path > (`.../fuseki/run/databases/MY_DATASET`) > + # tdb:unionDefaultGraph true ; # Optional > + . > + > + # Text index description (see documentation for other options) > + > <#indexLucene> a text:TextIndexLucene ; > - text:directory <file:/some/path/lucene-index> ; > + text:directory <file:/tmp/tdb-lucene-index> ; # <-- > replace `<file:/tmp/tdb-lucene-index> with your path` > (`<file:/.../fuseki/run/databases/MY_INDEX>`) > text:entityMap <#entMap> ; > - text:storeValues true ; > + text:storeValues true ; > text:analyzer [ a text:StandardAnalyzer ] ; > text:queryAnalyzer [ a text:KeywordAnalyzer ] ; > text:queryParser text:AnalyzingQueryParser ; > - text:defineAnalyzers [ . . . ] ; > text:multilingualSupport true ; > - . > + . > > + # Entity map (see documentation for other options) > + > + <#entMap> a text:EntityMap ; > + text:defaultField "label" ; # <-- > modify this value if needed > + text:entityField "uri" ; > + text:uidField "uid" ; > + text:langField "lang" ; > + text:graphField "graph" ; > + text:map ( > + [ text:field "label" ; # <-- > modify this value if needed > + text:predicate skos:prefLabel ] # <-- > provide the predicates you want to index > + ) . > + > + # Fuseki service (see documentation for other options) > + > + > + <#service> rdf:type fuseki:Service ; > + fuseki:name "/ds" ; # e.g : > `s-query --service=http://localhost:3030/ds "select * where {?s ?p ?o} limit > 5"` > + fuseki:serviceQuery "query" ; # SPARQL > query service > + fuseki:serviceReadGraphStore "data" ; # SPARQL > Graph store protocol (WARNING : read only dataset) > + fuseki:dataset :text_dataset ; > + . > + > + > The `text:TextDataset` has two properties: > > - a `text:dataset`, e.g., a `tdb:DatasetTDB`, to contain >