Osma,Good to see the patch - sorry I missed it on users@ - I was quite busy at the end of last week.
There are two reasons why you can't get the graph name from the graph:1/ Graphs might have more than one name - i.e be in the dataset, or another dataset, multiple times.
Graph from TDB do know their name - they are views on the dataset. 2/ Quads. When flatted to quads, the idea of current graph is undefined.At first glance, it looks quite easy to add the current graph name when not quadded. Property functions don't get tangled with quads.
However, the big question is which is best - whether no graph means index wide, c.f. unionDefaultgraph, or current graph. I don't know.
Andy
On 04/12/13 10:09, Osma Suominen wrote:
Hi, I'm reposting the below message from the users mailing list as this seems to be a more appropriate place to submit new patches. I'd like to add support to jena-text to store the named graph (URI) of the indexed triples, to get faster text query performance when the query is intended for only one named graph. The attached patch adds this information to the index. What is missing is proper support for actually using the graph information at query time - I had some problems implementing that, as detailed in my message below. Any comments are very welcome! Best regards Osma Suominen -------- Original Message -------- Subject: Re: jena-text limit by language and/or named graph Date: Fri, 29 Nov 2013 14:02:32 +0200 From: Osma Suominen <[email protected]> To: [email protected] Hi Andy!Should this be per map entry/ per predicate? I don't know which is best - whether a index-wide configuration or whether it might be some predicates are indexed one way and some another.For now, I think this can be global, i.e. not possible to set per predicate.(and if there is no lang, presumably "") .Probably yes, though I'll defer the lang discussion for now and concentrate on getting the graph information into the index first because that is more critical for me - I have dozens of graphs, but only a few languages in each graph.Sounds sane.Great!What would the query predicate in SPARQL look like?For the graph part, I think there is no need to introduce any new syntax. Simply having the text:query within the context of a specific graph should be enough, i.e. this should work: GRAPH <http://example.com/mygraph> { ?s text:query "keyword" . } For the language part, I'm not so sure, but I'll defer the discussion for now.If it all defaults back to the current mode of operations, we have a non-disturptive upgrade path which would better if possible. It's a change of disk-format which is always more of an issue for existing use.Yes, that is my intent, to not disrupt existing use in any way. Attached is a first draft patch which is my attempt at adding graph information to the index, iff graphField has been set in the config file, as in the attached config file. With this patch, you can use a query such as this: SELECT ?s { ?s text:query '+res* +graph:"http\\://example.com/graphA"' . } and you will only get results from within the specified graph. This is obviously a bit awkward since you have to know the name of the graph field, and also the URI quoting is ugly. But at least it proves that the graph information was successfully stored in the index and can be used for retrieval. However, I couldn't figure out how to get the URI of the current graph at query time so that an explicit "graph:" query part could be avoided. An ExecutionContext is passed to TextQueryPF methods and it has a getActiveGraph() method which looks promising. But neither the Graph interface nor the GraphBase implementation seem to be aware of the URI (or Node in general) they are identified by. The only (possible, untested) way that I could think of would be to also call ExecutionContext.getDataset(); then call DatasetGraph.listGraphNodes(); and for each of the Nodes, call DatasetGraph.getGraph(node) and see if the result matches the Graph that getActiveGraph() returned. But this seems awfully inefficient, especially if there are lots of graphs. Is there a better way to find out the URI of the current graph within TextQueryPF methods? Finally some misc notes: - TextDocProducerEntities seems to be unused - not touched - TextDocProducerTriples.[qQ]uadsToTriples is unused - not touched - TextIndexLucene.get$ - it seems a bit stupid to use a QueryParser when you could directly create a Query programmatically - not touched - I think get$ was broken anyway because it doesn't take into account that the query is tokenized by StandardAnalyzer - but this should now be fixed as a side effect of using PerFieldAnalyzerWrapper - I made similar changes in TextIndexSolr as in TextIndexLucene, but have so far tested only the Lucene part -Osma
