Code Ferret created JENA-1723:
---------------------------------
Summary: jena:text create OR's of Lucene fields
Key: JENA-1723
URL: https://issues.apache.org/jira/browse/JENA-1723
Project: Apache Jena
Issue Type: New Feature
Components: Jena
Affects Versions: Jena 3.13.0
Reporter: Code Ferret
Assignee: Code Ferret
h3. Motivation:
With the current {{jena:text}} we often find that we have query patterns such
as:
{code}
select ?foo where {
{
(?s ?sc ?lit) text:query ( rdfs:label "some query" "highlight:" ).
}
union
{
(?s ?sc ?lit) text:query ( skos:altLabel "some query" "highlight:" ).
}
union
{
(?s ?sc ?lit) text:query ( skos:prefLabel "some query" "highlight:").
}
}
{code}
For various sets of RDF properties, each corresponding to some Lucene field.
It can be more performant to _push_ the {{unions}} into the Lucene query by
rewriting as:
{code}
(altLabel:"some query" OR prefLabel:"some query" OR label:"some query")
{code}
Then it's a single query with Lucene performing the {{unions}}.
h3. Approach:
We've implemented this by
1. adding a new assembler feature in {{text:TextIndexLucene}}:
{code}
[] text:props (
text:propList [ text:propListProp ex:labels ;
text:props ( skos:prefLabel skos:altLabel rdfs:label ) ]
} ;
{code}
Which allows to give a single _Property_ id, e.g., {{ex:labels}}, to a list of
properties.
and
2. adding some syntax to the {{TextQueryPF}}:
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( text:props ex:labels "some query"
"highlight:" )
{code}
The addition of the fifth output arg, {{?prop}}, allows to return the specific
property that matched and if the input args includes {{text:props}} as the
first argument then there must be a list, of at least one, properties prior to
the query string. These properties are either the usual Lucene indexed
properties that occur in {{text:query}} or a property list property such as
{{ex:labels}} above.
When a list property is encountered it is expanded to the underlying list of
indexed properties from the configuration.
There may be any mix of indexed and property list properties following
{{text:props}} in the input arg list:
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( text:props ex:labels rdfs:comment "some
query" "highlight:" )
{code}
which searches over the three properties listed in {{ex:labels}} and the
property {{rdfs:comment}}.
This functionality is implemented, including copious tests, and a PR can be
issued after a bit of code cleanup.
h3. Discussion:
The use of {{text:props}} in the query form isn't strictly necessary, and was
introduced as a way of indicating the intent to have a list of properties to be
searched over.
If the {{text:props}} _flag_ is removed from the implementation then the
feature will simply check the property(s) for whether they are list properties
or just indexed properties.
With this modification the above queries would be written simply as:
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( ex:labels "some query" "highlight:" )
{code}
or
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( ex:labels rdfs:comment "some query"
"highlight:" )
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)