FSTLookup should use long-tail like discretization instead of proportional
(linear)
-----------------------------------------------------------------------------------
Key: SOLR-2761
URL: https://issues.apache.org/jira/browse/SOLR-2761
Project: Solr
Issue Type: Improvement
Components: spellchecker
Affects Versions: 3.4
Reporter: David Smiley
Priority: Minor
The Suggester's FSTLookup implementation discretizes the term frequencies into
a configurable number of buckets (configurable as "weightBuckets") in order to
deal with FST limitations. The mapping of a source frequency into a bucket is a
proportional (i.e. linear) mapping from the minimum and maximum value. I don't
think this makes sense at all given the well-known long-tail like distribution
of term frequencies. As a result of this problem, I've found it necessary to
increase weightBuckets substantially, like >100, to get quality suggestions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]