[jira] [Updated] (LUCENE-6459) [suggest] Query Interface for suggest API

Areek Zillur (JIRA) Thu, 14 May 2015 18:44:14 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Areek Zillur updated LUCENE-6459:
---------------------------------
    Description: 
This patch factors out common indexing/search API used by the recently 
introduced [NRTSuggester|https://issues.apache.org/jira/browse/LUCENE-6339]. 
The motivation is to provide a query interface for FST-based fields 
(*SuggestField* and *ContextSuggestField*) 
to enable suggestion scoring and more powerful automaton queries. 

Previously, only prefix ‘queries’ with index-time weights were supported but we 
can also support:

* Prefix queries expressed as regular expressions:  get suggestions that match 
multiple prefixes
      ** *Example:* _star\[wa\|tr\]_ matches _starwars_ and _startrek_
* Fuzzy Prefix queries supporting scoring: get typo tolerant suggestions scored 
by how close they are to the query prefix
    ** *Example:* querying for _seper_ will score _separate_ higher then 
_superstitious_
* Context Queries: get suggestions boosted and/or filtered based on their 
indexed contexts (meta data)
    ** *Boost example:* get typo tolerant suggestions on song names with prefix 
_like a roling_ boosting songs with 
genre _rock_ and _indie_
    ** *Filter example:* get suggestion on all file names starting with _finan_ 
only for _user1_ and _user2_

h3. Suggest API

{code}
SuggestIndexSearcher searcher = new SuggestIndexSearcher(reader);
CompletionQuery query = ...
TopSuggestDocs suggest = searcher.suggest(query, num);
{code}

h3. CompletionQuery

*CompletionQuery* is used to query *SuggestField* and *ContextSuggestField*. A 
*CompletionQuery* produces a *CompletionWeight*, 
which allows *CompletionQuery* implementations to pass in an automaton that 
will be intersected with a FST and allows boosting and 
meta data extraction from the intersected partial paths. A *CompletionWeight* 
produces a *CompletionScorer*. A *CompletionScorer* 
executes a Top N search against the FST with the provided automaton, scoring 
and filtering all matched paths. 

h4. PrefixCompletionQuery
Return documents with values that match the prefix of an analyzed term text 
Documents are sorted according to their suggest field weight. 
{code}
PrefixCompletionQuery(Analyzer analyzer, Term term)
{code}

h4. RegexCompletionQuery
Return documents with values that match the prefix of a regular expression
Documents are sorted according to their suggest field weight.
{code}
RegexCompletionQuery(Term term)
{code}

h4. FuzzyCompletionQuery
Return documents with values that has prefixes within a specified edit distance 
of an analyzed term text.
Documents are ‘boosted’ by the number of matching prefix letters of the 
suggestion with respect to the original term text.

{code}
FuzzyCompletionQuery(Analyzer analyzer, Term term)
{code}

h5. Scoring
{{suggestion_weight + (global_maximum_weight * boost)}}
where {{suggestion_weight}}, {{global_maximum_weight}} and {{boost}} are all 
integers. 
{{boost = # of prefix characters matched}}

h4. ContextQuery
Return documents that match a {{CompletionQuery}} filtered and/or boosted by 
provided context(s). 
{code}
ContextQuery(CompletionQuery query)
contextQuery.addContext(CharSequence context, int boost, boolean exact)
{code}

*NOTE:* {{ContextQuery}} should be used with {{ContextSuggestField}} to query 
suggestions boosted and/or filtered by contexts.
Running {{ContextQuery}} against a {{SuggestField}} will error out.


h5. Scoring
{{suggestion_weight + (global_maximum_weight * context_boost)}}
where {{suggestion_weight}}, {{global_maximum_weight}} and {{context_boost}} 
are all integers

When used with {{FuzzyCompletionQuery}},
{{suggestion_weight + (global_maximum_weight * (context_boost + fuzzy_boost))}}


h3. Context Suggest Field
To use {{ContextQuery}}, use {{ContextSuggestField}} instead of 
{{SuggestField}}. Any {{CompletionQuery}} can be used with 
{{ContextSuggestField}}, the default behaviour is to return suggestions from 
*all* contexts. {{Context}} for every completion hit 
can be accessed through {{SuggestScoreDoc#context}}.
{code}
ContextSuggestField(String name, Collection<CharSequence> contexts, String 
value, int weight) 
{code}

  was:
This patch factors out common indexing/search API used by the recently 
introduced [NRTSuggester|https://issues.apache.org/jira/browse/LUCENE-6339]. 
The motivation is to provide a query interface for FST-based fields 
(*SuggestField* and *ContextSuggestField*) 
for enabling suggestion scoring and more powerful automaton queries. 

Previously, only prefix ‘queries’ with index-time weights were supported but we 
can also support:

* Prefix queries expressed as regular expressions:  get suggestions that match 
multiple prefixes
      ** *Example:* _star\[wa\|tr\]_ matches _starwars_ and _startrek_
* Fuzzy Prefix queries supporting scoring: get typo tolerant suggestions scored 
by how close they are to the query prefix
    ** *Example:* querying for _seper_ will score _separate_ higher then 
_superstitious_
* Context Queries: get suggestions boosted and/or filtered based on their 
indexed contexts (meta data)
    ** *Boost example:* get typo tolerant suggestions on song names with prefix 
_like a roling_ boosting songs with 
genre _rock_ and _indie_
    ** *Filter example:* get suggestion on all file names starting with _finan_ 
only for _user1_ and _user2_

h3. Suggest API

{code}
SuggestIndexSearcher searcher = new SuggestIndexSearcher(reader);
CompletionQuery query = ...
TopSuggestDocs suggest = searcher.suggest(query, num);
{code}

h3. CompletionQuery

*CompletionQuery* is used to query *SuggestField* and *ContextSuggestField*. A 
*CompletionQuery* produces a *CompletionWeight*, 
which allows *CompletionQuery* implementations to pass in an automaton that 
will be intersected with a FST and allows boosting and 
meta data extraction from the intersected partial paths. A *CompletionWeight* 
produces a *CompletionScorer*. A *CompletionScorer* 
executes a Top N search against the FST with the provided automaton, scoring 
and filtering all matched paths. 

h4. PrefixCompletionQuery
Return documents with values that match the prefix of an analyzed term text 
Documents are sorted according to their suggest field weight. 
{code}
PrefixCompletionQuery(Analyzer analyzer, Term term)
{code}

h4. RegexCompletionQuery
Return documents with values that match the prefix of a regular expression
Documents are sorted according to their suggest field weight.
{code}
RegexCompletionQuery(Term term)
{code}

h4. FuzzyCompletionQuery
Return documents with values that has prefixes within a specified edit distance 
of an analyzed term text.
Documents are ‘boosted’ by the number of matching prefix letters of the 
suggestion with respect to the original term text.

{code}
FuzzyCompletionQuery(Analyzer analyzer, Term term)
{code}

h5. Scoring
{{suggestion_weight + (global_maximum_weight * boost)}}
where {{suggestion_weight}}, {{global_maximum_weight}} and {{boost}} are all 
integers. 
{{boost = # of prefix characters matched}}

h4. ContextQuery
Return documents that match a {{CompletionQuery}} filtered and/or boosted by 
provided context(s). 
{code}
ContextQuery(CompletionQuery query)
contextQuery.addContext(CharSequence context, int boost, boolean exact)
{code}

*NOTE:* {{ContextQuery}} should be used with {{ContextSuggestField}} to query 
suggestions boosted and/or filtered by contexts.
Running {{ContextQuery}} against a {{SuggestField}} will error out.


h5. Scoring
{{suggestion_weight + (global_maximum_weight * context_boost)}}
where {{suggestion_weight}}, {{global_maximum_weight}} and {{context_boost}} 
are all integers

When used with {{FuzzyCompletionQuery}},
{{suggestion_weight + (global_maximum_weight * (context_boost + fuzzy_boost))}}


h3. Context Suggest Field
To use {{ContextQuery}}, use {{ContextSuggestField}} instead of 
{{SuggestField}}. Any {{CompletionQuery}} can be used with 
{{ContextSuggestField}}, the default behaviour is to return suggestions from 
*all* contexts. {{Context}} for every completion hit 
can be accessed through {{SuggestScoreDoc#context}}.
{code}
ContextSuggestField(String name, Collection<CharSequence> contexts, String 
value, int weight) 
{code}


> [suggest] Query Interface for suggest API
> -----------------------------------------
>
>                 Key: LUCENE-6459
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6459
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 5.1
>            Reporter: Areek Zillur
>            Assignee: Areek Zillur
>             Fix For: Trunk, 5.x, 5.1
>
>         Attachments: LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, 
> LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, 
> LUCENE-6459.patch
>
>
> This patch factors out common indexing/search API used by the recently 
> introduced [NRTSuggester|https://issues.apache.org/jira/browse/LUCENE-6339]. 
> The motivation is to provide a query interface for FST-based fields 
> (*SuggestField* and *ContextSuggestField*) 
> to enable suggestion scoring and more powerful automaton queries. 
> Previously, only prefix ‘queries’ with index-time weights were supported but 
> we can also support:
> * Prefix queries expressed as regular expressions:  get suggestions that 
> match multiple prefixes
>       ** *Example:* _star\[wa\|tr\]_ matches _starwars_ and _startrek_
> * Fuzzy Prefix queries supporting scoring: get typo tolerant suggestions 
> scored by how close they are to the query prefix
>     ** *Example:* querying for _seper_ will score _separate_ higher then 
> _superstitious_
> * Context Queries: get suggestions boosted and/or filtered based on their 
> indexed contexts (meta data)
>     ** *Boost example:* get typo tolerant suggestions on song names with 
> prefix _like a roling_ boosting songs with 
> genre _rock_ and _indie_
>     ** *Filter example:* get suggestion on all file names starting with 
> _finan_ only for _user1_ and _user2_
> h3. Suggest API
> {code}
> SuggestIndexSearcher searcher = new SuggestIndexSearcher(reader);
> CompletionQuery query = ...
> TopSuggestDocs suggest = searcher.suggest(query, num);
> {code}
> h3. CompletionQuery
> *CompletionQuery* is used to query *SuggestField* and *ContextSuggestField*. 
> A *CompletionQuery* produces a *CompletionWeight*, 
> which allows *CompletionQuery* implementations to pass in an automaton that 
> will be intersected with a FST and allows boosting and 
> meta data extraction from the intersected partial paths. A *CompletionWeight* 
> produces a *CompletionScorer*. A *CompletionScorer* 
> executes a Top N search against the FST with the provided automaton, scoring 
> and filtering all matched paths. 
> h4. PrefixCompletionQuery
> Return documents with values that match the prefix of an analyzed term text 
> Documents are sorted according to their suggest field weight. 
> {code}
> PrefixCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h4. RegexCompletionQuery
> Return documents with values that match the prefix of a regular expression
> Documents are sorted according to their suggest field weight.
> {code}
> RegexCompletionQuery(Term term)
> {code}
> h4. FuzzyCompletionQuery
> Return documents with values that has prefixes within a specified edit 
> distance of an analyzed term text.
> Documents are ‘boosted’ by the number of matching prefix letters of the 
> suggestion with respect to the original term text.
> {code}
> FuzzyCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{boost}} are all 
> integers. 
> {{boost = # of prefix characters matched}}
> h4. ContextQuery
> Return documents that match a {{CompletionQuery}} filtered and/or boosted by 
> provided context(s). 
> {code}
> ContextQuery(CompletionQuery query)
> contextQuery.addContext(CharSequence context, int boost, boolean exact)
> {code}
> *NOTE:* {{ContextQuery}} should be used with {{ContextSuggestField}} to query 
> suggestions boosted and/or filtered by contexts.
> Running {{ContextQuery}} against a {{SuggestField}} will error out.
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * context_boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{context_boost}} 
> are all integers
> When used with {{FuzzyCompletionQuery}},
> {{suggestion_weight + (global_maximum_weight * (context_boost + 
> fuzzy_boost))}}
> h3. Context Suggest Field
> To use {{ContextQuery}}, use {{ContextSuggestField}} instead of 
> {{SuggestField}}. Any {{CompletionQuery}} can be used with 
> {{ContextSuggestField}}, the default behaviour is to return suggestions from 
> *all* contexts. {{Context}} for every completion hit 
> can be accessed through {{SuggestScoreDoc#context}}.
> {code}
> ContextSuggestField(String name, Collection<CharSequence> contexts, String 
> value, int weight) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-6459) [suggest] Query Interface for suggest API

Reply via email to