[
https://issues.apache.org/jira/browse/JENA-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240123#comment-16240123
]
Osma Suominen edited comment on JENA-1388 at 11/6/17 10:39 AM:
---------------------------------------------------------------
What [~andy.seaborne] says is correct. The way jena-text with the Lucene
backend works is that it creates separate documents for each indexed field. So
one triple (or quad, when graph-specific indexing is enabled) corresponds to
one document in the Lucene index. The upside is that this makes it rather
simple to synchronize updates between the triple store and the Lucene index:
for new triples, add documents into Lucene; for deleted triples, delete the
corresponding documents from Lucene. The downside is that AND queries cannot be
supported. This is a pretty fundamental design choice in jena-text so it cannot
be simply fixed like a normal bug. It would require reengineering significant
parts of the jena-text subsystem.
Note that the recently added Elasticsearch backend for jena-text works
differently: it consolidates triples with the same subject into a single
document in the text index. But it has to do a lot of bookkeeping to keep the
information synchronized. One consequence of this is that updates to the index
are very slow compared with the Lucene backend (though a major factor in this
is also that operations are performed via a REST API to the Elasticsearch
server, whereas the Lucene backend lives in the same JVM). The Elasticsearch
backend does support AND queries, so you may want to try it instead of using
the Lucene backend.
was (Author: osma):
What [~andy.seaborne] says is correct. The way jena-text with the Lucene
backend works is that it creates separate documents for each document. So one
triple (or quad, when graph-specific indexing is enabled) corresponds to one
document in the Lucene index. The upside is that this makes it rather simple to
synchronize updates between the triple store and the Lucene index: for new
triples, add documents into Lucene; for deleted triples, delete the
corresponding documents from Lucene. The downside is that AND queries cannot be
supported. This is a pretty fundamental design choice in jena-text so it cannot
be simply fixed like a normal bug. It would require reengineering significant
parts of the jena-text subsystem.
Note that the recently added Elasticsearch backend for jena-text works
differently: it consolidates triples with the same subject into a single
document in the text index. But it has to do a lot of bookkeeping to keep the
information synchronized. One consequence of this is that updates to the index
are very slow compared with the Lucene backend (though a major factor in this
is also that operations are performed via a REST API to the Elasticsearch
server, whereas the Lucene backend lives in the same JVM). The Elasticsearch
backend does support AND queries, so you may want to try it instead of using
the Lucene backend.
> Lucene text search across multiple fields ("AND") yields no results
> -------------------------------------------------------------------
>
> Key: JENA-1388
> URL: https://issues.apache.org/jira/browse/JENA-1388
> Project: Apache Jena
> Issue Type: Bug
> Components: Text
> Affects Versions: Jena 3.4.0
> Environment: CentOS 7.3, OpenJDK 64-Bit, v1.8.0_141-b16
> Reporter: Vilnis Termanis (Iotic Labs)
> Assignee: Osma Suominen
> Labels: index, lucene, search
> Attachments: config-fields.ttl, multi_field.ttl, multi_index.sparql
>
>
> Searching across two Lucene text indexed fields produces potentially
> unexpected results. (The following assumes that the string supplied to each
> field does match and is tied to the same uid/subject.)
> # A query across two fields with *OR* produces two equal rows
> # The same query but with *AND* produces no rows
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)