Code Ferret created JENA-1723:
---------------------------------

             Summary: jena:text create OR's of Lucene fields
                 Key: JENA-1723
                 URL: https://issues.apache.org/jira/browse/JENA-1723
             Project: Apache Jena
          Issue Type: New Feature
          Components: Jena
    Affects Versions: Jena 3.13.0
            Reporter: Code Ferret
            Assignee: Code Ferret


h3. Motivation:

With the current {{jena:text}} we often find that we have query patterns such 
as:
{code}
select ?foo where {
  {
     (?s ?sc ?lit) text:query ( rdfs:label "some query" "highlight:" ).
  }
  union
  {
    (?s ?sc ?lit) text:query ( skos:altLabel "some query" "highlight:" ).
  }
  union
  { 
    (?s ?sc ?lit) text:query ( skos:prefLabel "some query" "highlight:").
  }
}
{code}
For various sets of RDF properties, each corresponding to some Lucene field.

It can be more performant to _push_ the {{unions}} into the Lucene query by 
rewriting as:
{code}
(altLabel:"some query" OR prefLabel:"some query" OR label:"some query")
{code}
Then it's a single query with Lucene performing the {{unions}}.

h3. Approach:

We've implemented this by 

1. adding a new assembler feature in {{text:TextIndexLucene}}:
{code}
[] text:props (
    text:propList [ text:propListProp  ex:labels ;
         text:props ( skos:prefLabel skos:altLabel rdfs:label ) ]
} ;
{code}
Which allows to give a single _Property_ id, e.g., {{ex:labels}}, to a list of 
properties.

and

2. adding some syntax to the {{TextQueryPF}}:
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( text:props ex:labels "some query" 
"highlight:" )
{code}
The addition of the fifth output arg, {{?prop}}, allows to return the specific 
property that matched and if the input args includes {{text:props}} as the 
first argument then there must be a list, of at least one, properties prior to 
the query string. These properties are either the usual Lucene indexed 
properties that occur in {{text:query}} or a property list property such as 
{{ex:labels}} above.

When a list property is encountered it is expanded to the underlying list of 
indexed properties from the configuration.

There may be any mix of indexed and property list properties following 
{{text:props}} in the input arg list:
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( text:props ex:labels rdfs:comment "some 
query" "highlight:" )
{code}
which searches over the three properties listed in {{ex:labels}} and the 
property {{rdfs:comment}}.

This functionality is implemented, including copious tests, and a PR can be 
issued after a bit of code cleanup.

h3. Discussion:

The use of {{text:props}} in the query form isn't strictly necessary, and was 
introduced as a way of indicating the intent to have a list of properties to be 
searched over. 

If the {{text:props}} _flag_ is removed from the implementation then the 
feature will simply check the property(s) for whether they are list properties 
or just indexed properties.

With this modification the above queries would be written simply as:
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( ex:labels "some query" "highlight:" )
{code}
or
{code}
(?s ?sc ?lit ?graph ?prop) text:query ( ex:labels rdfs:comment "some query" 
"highlight:" )
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to