Re: jena-text limit by named graph (and language?)

Andy Seaborne Sat, 07 Dec 2013 13:14:07 -0800


Comments? Any chances of getting this merged?


Tests! Excellent!

To make sure it does not get lost:

https://issues.apache.org/jira/browse/JENA-605

and added the files from your email.

Looks good - a couple of small questions:

1/ Blank node graphs - how about using the pseudo URI _:label ratherthan use g.getBlankNodeLabel()?

2/ Did I get it right that the default graph isQuad.defaultGraphNodeGenerated? Maybe


How much of the documentation needs to change?  Just another section?

        Andy


On 04/12/13 18:14, Osma Suominen wrote:

Hi!

Sorry for spamming the list again :) This turned out to be easier to
implement than I thought.

Attached is a new version of the patch. This adds support for storing
the graph URI in the text index, as well as making use of it at query
time. The storing and use of graph URIs in the text index is optional,
and is enabled by defining the text:graphField property, as in the
attached config file. By default, no graph information is stored, i.e.
nothing changes, so the enhancement should be 100% backward compatible
and should not cause trouble for upgrading.


To test this, do the following:

1. Rebuild and reinstall jena-text and Fuseki with the attached patch

2. Start Fuseki with the attached config file:
    ./fuseki-server --config config-text-tdb-graph.ttl

3. Put this in the named graph <http://example.com/graphA>:
<http://example.com/resourceA>
<http://www.w3.org/2000/01/rdf-schema#label> "resourceA" .

...and this in the named graph <http://example.com/graphB>:
<http://example.com/resourceB>
<http://www.w3.org/2000/01/rdf-schema#label> "resourceB" .

4. Run the following SPARQL query:

PREFIX text: <http://jena.apache.org/text#>
SELECT ?s {
   GRAPH <http://example.com/graphA> {
     ?s text:query 'res*' .
   }
}

If everything worked, you should get only one result,
<http://example.com/resourceA>. Without this patch (or with the graph
indexing disabled), you will also get <http://example.com/resourceB>.

I haven't yet tested the performance of this modification, but I expect
this to perform much better than current jena-text for queries targeted
at a single named graph, where the index currently returns hits from all
graphs. I'll try to find out soon.

I did find that the increase in index size is negligible (this is after
loading the STW Thesaurus, UNESCO Thesaurus, GEMET and Reegle thesaurus
into distinct named graphs, using skos:prefLabel as the indexed predicate):

$ du -s Lucene*
5004    Lucene
5012    Lucene-graph


Comments? Any chances of getting this merged?

-Osma


04.12.2013 17:59, Osma Suominen wrote:

04.12.2013 15:40, Osma Suominen wrote:

So my question is: if we assume that we're dealing with TDB graphs, and
the SPARQL pattern limits the context to a single graph URI (as e.g.
<http://example.com/mygraph> in the example below), how can the
text:search property function know that and find out the graph URI?


Ah, nevermind, I got it now. The object available from
execCxt.getActiveGraph() inside TextQueryPF.exec() is actually a
GraphTDB instance in this case. GraphTDB inherits the getGraphName()
method from GraphView. And it seems I can use that method (as well as
isDefaultGraph() and isUnionGraph() for sanity checks) to determine the
graph URI to query for in the Lucene/Solr index.

I will try to implement the query side now, but it might take a while.

-Osma

Re: jena-text limit by named graph (and language?)

Reply via email to