FSTLookup should use long-tail like discretization instead of proportional 
(linear)
-----------------------------------------------------------------------------------

                 Key: SOLR-2761
                 URL: https://issues.apache.org/jira/browse/SOLR-2761
             Project: Solr
          Issue Type: Improvement
          Components: spellchecker
    Affects Versions: 3.4
            Reporter: David Smiley
            Priority: Minor


The Suggester's FSTLookup implementation discretizes the term frequencies into 
a configurable number of buckets (configurable as "weightBuckets") in order to 
deal with FST limitations. The mapping of a source frequency into a bucket is a 
proportional (i.e. linear) mapping from the minimum and maximum value. I don't 
think this makes sense at all given the well-known long-tail like distribution 
of term frequencies. As a result of this problem, I've found it necessary to 
increase weightBuckets substantially, like >100, to get quality suggestions. 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to