[
https://issues.apache.org/jira/browse/SOLR-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104761#comment-13104761
]
Michael McCandless commented on SOLR-2761:
------------------------------------------
Ooooh, the javadocs and comments are awesome! -- thanks Dawid and
David.
I was just wondering what specifically is the limitation on our FST
impl and whether it's something we could improve. It sounds like the
limitation is just how we quantize the incoming weights...
David, when you use > 100 buckets did you see bad performance for
low-weight lookups?
Maybe, in addition to the up-front quantization, we could also store
a more exact weight for each term (eg as the output). Then on
retrieve we could re-sort the candidates by that exact weight. But
this will make the FST larger...
> FSTLookup should use long-tail like discretization instead of proportional
> (linear)
> -----------------------------------------------------------------------------------
>
> Key: SOLR-2761
> URL: https://issues.apache.org/jira/browse/SOLR-2761
> Project: Solr
> Issue Type: Improvement
> Components: spellchecker
> Affects Versions: 3.4
> Reporter: David Smiley
> Priority: Minor
>
> The Suggester's FSTLookup implementation discretizes the term frequencies
> into a configurable number of buckets (configurable as "weightBuckets") in
> order to deal with FST limitations. The mapping of a source frequency into a
> bucket is a proportional (i.e. linear) mapping from the minimum and maximum
> value. I don't think this makes sense at all given the well-known long-tail
> like distribution of term frequencies. As a result of this problem, I've
> found it necessary to increase weightBuckets substantially, like >100, to get
> quality suggestions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]