[
https://issues.apache.org/jira/browse/LUCENE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14552580#comment-14552580
]
Michael McCandless commented on LUCENE-6459:
--------------------------------------------
bq. IMO, a suggestion weight is just an index-time boost for the associated
entry.
+1
Shouldn't we switch to float (later, separate issue!)?
{quote}
Calculating a suggestion score of weight + (maxWeight * boost) makes sure that
entries with a higher boost (longer common prefix w.r.t. query prefix) will
always be
scored higher regardless of the index-time weight of suggestion entries. The
segment-level maxWeight is stored in CompletionPostingsFormat (CompletionIndex),
and the maxWeight is computed across all segments at query-time.
Since, the maximum weight for any suggestion entry will be <=
Integer.MAX_VALUE, we
can just replace the maxWeight for a suggestField with Integer.MAX_VALUE? One
problem
might be the loss of precision when converting the long score to a float?
{quote}
OK I think I understand! It's a scoring model that guarantees that
the search-time score ranking comes first and index time ranking is
used only for tie-breaking the search-time score.
But, I'm not sure that's a good goal? E.g I may not want a very low
index-time boosted suggestion that has more shared prefix to score
higher than a highly index-time boosted suggestion that matched a bit
smaller prefix?
E.g. when I type "pyh" into google, the top suggestion is "python"
(prefix=2) and after that is "pyhs2" (prefix=3).
Anyway, I think scoring will forever be a challenging topic and we
just have to keep it pluggable (CompletionScorer.score).
Hmm {{ant precommit}} is failing for me:
{noformat}
-documentation-lint:
[echo] checking for broken html...
[jtidy] Checking for broken html (such as invalid tags)...
[delete] Deleting directory /l/areek/lucene/build/jtidy_tmp
[echo] Checking for broken links...
[exec]
[exec] Crawl/parse...
[exec]
[exec] Verify...
[exec]
[exec]
file:///build/docs/suggest/org/apache/lucene/search/suggest/document/CompletionTerms.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.CompletionFieldsProducer.CompletionsTermsReader.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.NRTSuggester.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.NRTSuggester.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.CompletionFieldsProducer.CompletionsTermsReader.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.NRTSuggester.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.NRTSuggester.html
[exec]
[exec]
file:///build/docs/suggest/org/apache/lucene/search/suggest/document/SuggestField.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.CompletionTokenStream.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.CompletionTokenStream.html
[exec]
[exec]
file:///build/docs/suggest/org/apache/lucene/search/suggest/document/class-use/CompletionWeight.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.NRTSuggester.html
[exec]
[exec]
file:///build/docs/suggest/org/apache/lucene/search/suggest/document/CompletionScorer.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.NRTSuggester.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.NRTSuggester.html
[exec]
[exec]
file:///build/docs/suggest/org/apache/lucene/search/suggest/document/ContextSuggestField.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.CompletionTokenStream.html
[exec] BROKEN LINK:
file:///build/docs/core/org/apache/lucene/search/suggest.document.CompletionTokenStream.html
[exec]
[exec] Broken javadocs links were found!
{noformat}
> [suggest] Query Interface for suggest API
> -----------------------------------------
>
> Key: LUCENE-6459
> URL: https://issues.apache.org/jira/browse/LUCENE-6459
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/search
> Affects Versions: 5.1
> Reporter: Areek Zillur
> Assignee: Areek Zillur
> Fix For: Trunk, 5.x, 5.1
>
> Attachments: LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch,
> LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch,
> LUCENE-6459.patch
>
>
> This patch factors out common indexing/search API used by the recently
> introduced [NRTSuggester|https://issues.apache.org/jira/browse/LUCENE-6339].
> The motivation is to provide a query interface for FST-based fields
> (*SuggestField* and *ContextSuggestField*)
> to enable suggestion scoring and more powerful automaton queries.
> Previously, only prefix ‘queries’ with index-time weights were supported but
> we can also support:
> * Prefix queries expressed as regular expressions: get suggestions that
> match multiple prefixes
> ** *Example:* _star\[wa\|tr\]_ matches _starwars_ and _startrek_
> * Fuzzy Prefix queries supporting scoring: get typo tolerant suggestions
> scored by how close they are to the query prefix
> ** *Example:* querying for _seper_ will score _separate_ higher then
> _superstitious_
> * Context Queries: get suggestions boosted and/or filtered based on their
> indexed contexts (meta data)
> ** *Boost example:* get typo tolerant suggestions on song names with
> prefix _like a roling_ boosting songs with
> genre _rock_ and _indie_
> ** *Filter example:* get suggestion on all file names starting with
> _finan_ only for _user1_ and _user2_
> h3. Suggest API
> {code}
> SuggestIndexSearcher searcher = new SuggestIndexSearcher(reader);
> CompletionQuery query = ...
> TopSuggestDocs suggest = searcher.suggest(query, num);
> {code}
> h3. CompletionQuery
> *CompletionQuery* is used to query *SuggestField* and *ContextSuggestField*.
> A *CompletionQuery* produces a *CompletionWeight*,
> which allows *CompletionQuery* implementations to pass in an automaton that
> will be intersected with a FST and allows boosting and
> meta data extraction from the intersected partial paths. A *CompletionWeight*
> produces a *CompletionScorer*. A *CompletionScorer*
> executes a Top N search against the FST with the provided automaton, scoring
> and filtering all matched paths.
> h4. PrefixCompletionQuery
> Return documents with values that match the prefix of an analyzed term text
> Documents are sorted according to their suggest field weight.
> {code}
> PrefixCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h4. RegexCompletionQuery
> Return documents with values that match the prefix of a regular expression
> Documents are sorted according to their suggest field weight.
> {code}
> RegexCompletionQuery(Term term)
> {code}
> h4. FuzzyCompletionQuery
> Return documents with values that has prefixes within a specified edit
> distance of an analyzed term text.
> Documents are ‘boosted’ by the number of matching prefix letters of the
> suggestion with respect to the original term text.
> {code}
> FuzzyCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{boost}} are all
> integers.
> {{boost = # of prefix characters matched}}
> h4. ContextQuery
> Return documents that match a {{CompletionQuery}} filtered and/or boosted by
> provided context(s).
> {code}
> ContextQuery(CompletionQuery query)
> contextQuery.addContext(CharSequence context, int boost, boolean exact)
> {code}
> *NOTE:* {{ContextQuery}} should be used with {{ContextSuggestField}} to query
> suggestions boosted and/or filtered by contexts.
> Running {{ContextQuery}} against a {{SuggestField}} will error out.
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * context_boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{context_boost}}
> are all integers
> When used with {{FuzzyCompletionQuery}},
> {{suggestion_weight + (global_maximum_weight * (context_boost +
> fuzzy_boost))}}
> h3. Context Suggest Field
> To use {{ContextQuery}}, use {{ContextSuggestField}} instead of
> {{SuggestField}}. Any {{CompletionQuery}} can be used with
> {{ContextSuggestField}}, the default behaviour is to return suggestions from
> *all* contexts. {{Context}} for every completion hit
> can be accessed through {{SuggestScoreDoc#context}}.
> {code}
> ContextSuggestField(String name, Collection<CharSequence> contexts, String
> value, int weight)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]