Clone URL (Committers only):
https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://jena.apache.org/documentation%2Fquery%2Ftext-query.mdtext
Chris Dollin
Index: trunk/content/documentation/query/text-query.mdtext
===================================================================
--- trunk/content/documentation/query/text-query.mdtext (revision 1655891)
+++ trunk/content/documentation/query/text-query.mdtext (working copy)
@@ -43,6 +43,7 @@
- [Working with Fuseki](#working-with-fuseki)
- [Building a Text Index](#building-a-text-index)
- [Deletion of Indexed Entities](#deletion-of-indexed-entities)
+- [Configuring Alternative
TextDocProducers](#configuring-alternative-textdocproducers)
- [Maven Dependency](#maven-dependency)
## Architecture
@@ -405,6 +406,73 @@
It may be necessary to periodically rebuild the index if a large proportion
of the RDF data changes.
+# Configuring Alternative TextDocProducers
+
+The default behaviour when text indexing is to index a single
+property as a single field, generating a different `Document`
+for each indexed triple. To change this behaviour requires
+writing and configuring an alternative 'TextDocProducer'.
+
+To configure a `TextDocProducer` `MyProducer` in a dataset assembly,
+use the property `textDocProducer`, eg:
+
+ <#ds-with-lucene> rdf:type text:TextDataset;
+ text:index <#indexLucene> ;
+ text:dataset <#ds> ;
+ text:textDocProducer <java:CLASSNAME> ;
+ .
+
+where CLASSNAME is the `TextDocProducer` class; it must have either
+a single-argument constructor of type `TextIndex`, or a two-argument
+constructor `(DatasetGraph, TextIndex)`. The `TextIndex` argument
+will be the configured text index, and the `DatasetGraph` argument
+will be the graph of the configured dataset.
+
+For example, to explicitly create the default `TextDocProducer` use:
+
+ ...
+ text:textDocProducer
<java:org.apache.jena.query.text.TextDocProducerTriples> ;
+ ...
+
+`TextDocProducerTriples` produces a new `Document` for each subject/field
+added to the dataset, using `TextIndex.addEntity(Entity)`.
+
+## Example
+
+The example class below is a `TextDocProducer` that only indexes
+`ADD`s of quads for which the subject already had at least one
+property-value. It uses the two-argument constructor to give it
+access to the dataset so that it count the `(?G, S, P, ?O)` quads
+with that subject and predicate, and delegates the indexing to
+`TextDocProducerTriples` if there are at least two values for
+that property (one of those values, of course, is the one that
+gives rise to this `change()`).
+
+
+ public class Example extends TextDocProducerTriples {
+
+ final DatasetGraph dg;
+
+ public Example(DatasetGraph dg, TextIndex indexer) {
+ super(indexer);
+ this.dg = dg;
+ }
+
+ public void change(QuadAction qaction, Node g, Node s, Node p,
Node o) {
+ if (qaction == QuadAction.ADD) {
+ if (alreadyHasOne(s, p)) super.change(qaction,
g, s, p, o);
+ }
+ }
+
+ private boolean alreadyHasOne(Node s, Node p) {
+ int count = 0;
+ Iterator<Quad> quads = dg.find( null, s, p, null );
+ while (quads.hasNext()) { quads.next(); count += 1; }
+ return count > 1;
+ }
+
+ }
+
## Maven Dependency
The <code>jena-text</code> module is included in Fuseki. To use it within
application code,
@@ -417,4 +485,4 @@
</dependency>
adjusting the version <code>X.Y.Z</code> as necessary. This will automatically
-include a compatible version of Lucene and the Solr java client, but not Solr
server.
\ No newline at end of file
+include a compatible version of Lucene and the Solr java client, but not Solr
server.