[jira] [Comment Edited] (LUCENE-6459) [suggest] Query Interface for suggest API

Areek Zillur (JIRA) Mon, 25 May 2015 19:42:36 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558629#comment-14558629
 ]


Areek Zillur edited comment on LUCENE-6459 at 5/26/15 2:41 AM:
---------------------------------------------------------------

Thanks [~mikemccand] for the feedback!

{quote}
Anyway, I think scoring will forever be a challenging topic and we
just have to keep it pluggable (CompletionScorer.score).
{quote}

I have made the CompletionScorer.score pluggable by extracting a {{ScoreMode}} 
interface, which 
computes a score for a suggestion using its weight and boost (along with 
min/max weight stats). 
{code}
public interface ScoreMode {
  float score(float weight, float boost, float minWeight, float maxWeight);
}
{code}

Currently there are three implementations: 
 - BOOST (weight * boost), 
 - IGNORE_BOOST (weight)
 - BOOST_FIRST (weight + (maxWeight * boost))

Now any {{CompletionQuery}} can plug in one of the score modes (or custom 
implementation) 
through {{CompletionQuery.setScoreMode}}.
Default score mode for existing queries:
 - {{PrefixCompletionQuery}} & {{RegexCompletionQuery}} - IGNORE_BOOST
 - {{FuzzyCompletionQuery}} - BOOST_FIRST
 - {{ContextQuery}} - BOOST

{{ScoreMode}} is also used in the comparator for the TopNSearcher queue.

It would be good to know your thoughts on how the TopNSearcher queue size is 
calculated here. 
Now that every intersected prefix path might have a different boost and this 
boost influences the top N, 
the topN for the queue has been changed to {{n * num_prefix_paths}} from just 
{{n}} to allow for {{n}} path 
expansions per intersected prefix path. In reality, search execution will early 
terminate as soon as the 
original topN results have been collected. 

Updated Patch:
 - Added {{ScoreMode}} for pluggable scoring
 - Increase test coverage
 - Added dedicated tests for PrefixCompletionQuery, RegexCompletionQuery, 
   FuzzyCompletionQuery and ContextQuery
 - Fixed javadocs, making {{ant precommit}} happy


was (Author: areek):
Thanks [~mikemccand] for the feedback!

{quote}
Anyway, I think scoring will forever be a challenging topic and we
just have to keep it pluggable (CompletionScorer.score).
{quote}

I have made the CompletionScorer.score pluggable by extracting a {{ScoreMode}} 
interface, which computes a score for a suggestion using its weight and boost 
(along with min/max weight stats). 
{code}
public interface ScoreMode {
  float score(float weight, float boost, float minWeight, float maxWeight);
}
{code}

Currently there are three implementations: 
 - BOOST (weight * boost), 
 - IGNORE_BOOST (weight)
 - BOOST_FIRST (weight + (maxWeight * boost))

Now any {{CompletionQuery}} can plug in one of the score modes (or custom 
implementation) through {{CompletionQuery.setScoreMode}}.
Default score mode for existing queries:
 - {{PrefixCompletionQuery}} & {{RegexCompletionQuery}} - IGNORE_BOOST
 - {{FuzzyCompletionQuery}} - BOOST_FIRST
 - {{ContextQuery}} - BOOST

{{ScoreMode}} is also used in the comparator for the TopNSearcher queue.

It would be good to know your thoughts on how the TopNSearcher queue size is 
calculated here. Now that every intersected prefix path might have a different 
boost and this boost influences the top N, the topN for the queue has been 
changed to {{n * num_prefix_paths}} from just {{n}} to allow for {{n}} path 
expansions per intersected prefix path. In reality, search execution will early 
terminate as soon as the original topN results have been collected. 

Updated Patch:
 - Added {{ScoreMode}} for pluggable scoring
 - Increase test coverage
 - Added dedicated tests for PrefixCompletionQuery, RegexCompletionQuery, 
   FuzzyCompletionQuery and ContextQuery
 - Fixed javadocs, making {{ant precommit}} happy

> [suggest] Query Interface for suggest API
> -----------------------------------------
>
>                 Key: LUCENE-6459
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6459
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 5.1
>            Reporter: Areek Zillur
>            Assignee: Areek Zillur
>             Fix For: Trunk, 5.x, 5.1
>
>         Attachments: LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, 
> LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, 
> LUCENE-6459.patch, LUCENE-6459.patch
>
>
> This patch factors out common indexing/search API used by the recently 
> introduced [NRTSuggester|https://issues.apache.org/jira/browse/LUCENE-6339]. 
> The motivation is to provide a query interface for FST-based fields 
> (*SuggestField* and *ContextSuggestField*) 
> to enable suggestion scoring and more powerful automaton queries. 
> Previously, only prefix ‘queries’ with index-time weights were supported but 
> we can also support:
> * Prefix queries expressed as regular expressions:  get suggestions that 
> match multiple prefixes
>       ** *Example:* _star\[wa\|tr\]_ matches _starwars_ and _startrek_
> * Fuzzy Prefix queries supporting scoring: get typo tolerant suggestions 
> scored by how close they are to the query prefix
>     ** *Example:* querying for _seper_ will score _separate_ higher then 
> _superstitious_
> * Context Queries: get suggestions boosted and/or filtered based on their 
> indexed contexts (meta data)
>     ** *Boost example:* get typo tolerant suggestions on song names with 
> prefix _like a roling_ boosting songs with 
> genre _rock_ and _indie_
>     ** *Filter example:* get suggestion on all file names starting with 
> _finan_ only for _user1_ and _user2_
> h3. Suggest API
> {code}
> SuggestIndexSearcher searcher = new SuggestIndexSearcher(reader);
> CompletionQuery query = ...
> TopSuggestDocs suggest = searcher.suggest(query, num);
> {code}
> h3. CompletionQuery
> *CompletionQuery* is used to query *SuggestField* and *ContextSuggestField*. 
> A *CompletionQuery* produces a *CompletionWeight*, 
> which allows *CompletionQuery* implementations to pass in an automaton that 
> will be intersected with a FST and allows boosting and 
> meta data extraction from the intersected partial paths. A *CompletionWeight* 
> produces a *CompletionScorer*. A *CompletionScorer* 
> executes a Top N search against the FST with the provided automaton, scoring 
> and filtering all matched paths. 
> h4. PrefixCompletionQuery
> Return documents with values that match the prefix of an analyzed term text 
> Documents are sorted according to their suggest field weight. 
> {code}
> PrefixCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h4. RegexCompletionQuery
> Return documents with values that match the prefix of a regular expression
> Documents are sorted according to their suggest field weight.
> {code}
> RegexCompletionQuery(Term term)
> {code}
> h4. FuzzyCompletionQuery
> Return documents with values that has prefixes within a specified edit 
> distance of an analyzed term text.
> Documents are ‘boosted’ by the number of matching prefix letters of the 
> suggestion with respect to the original term text.
> {code}
> FuzzyCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{boost}} are all 
> integers. 
> {{boost = # of prefix characters matched}}
> h4. ContextQuery
> Return documents that match a {{CompletionQuery}} filtered and/or boosted by 
> provided context(s). 
> {code}
> ContextQuery(CompletionQuery query)
> contextQuery.addContext(CharSequence context, int boost, boolean exact)
> {code}
> *NOTE:* {{ContextQuery}} should be used with {{ContextSuggestField}} to query 
> suggestions boosted and/or filtered by contexts.
> Running {{ContextQuery}} against a {{SuggestField}} will error out.
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * context_boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{context_boost}} 
> are all integers
> When used with {{FuzzyCompletionQuery}},
> {{suggestion_weight + (global_maximum_weight * (context_boost + 
> fuzzy_boost))}}
> h3. Context Suggest Field
> To use {{ContextQuery}}, use {{ContextSuggestField}} instead of 
> {{SuggestField}}. Any {{CompletionQuery}} can be used with 
> {{ContextSuggestField}}, the default behaviour is to return suggestions from 
> *all* contexts. {{Context}} for every completion hit 
> can be accessed through {{SuggestScoreDoc#context}}.
> {code}
> ContextSuggestField(String name, Collection<CharSequence> contexts, String 
> value, int weight) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-6459) [suggest] Query Interface for suggest API

Reply via email to