[jira] [Commented] (LUCENE-5214) Add new FreeTextSuggester, to handle "long tail" suggestions

Areek Zillur (JIRA) Tue, 01 Oct 2013 22:16:18 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783660#comment-13783660
 ]


Areek Zillur commented on LUCENE-5214:
--------------------------------------

Hey Michael, had a question for you, this may not be the most relevant place to 
ask but will do anyways.

I was curious to know why you did not implement the load and store methods for 
your AnalyzingInfixSuggester rather build the index at the ctor? was it because 
of the fact that they take a Input/output stream? What are your thoughts on 
generalizing the interface so that the index can be loaded up and stored as it 
is done by all the other suggesters?

> Add new FreeTextSuggester, to handle "long tail" suggestions
> ------------------------------------------------------------
>
>                 Key: LUCENE-5214
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5214
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spellchecker
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 4.6
>
>         Attachments: LUCENE-5214.patch, LUCENE-5214.patch
>
>
> The current suggesters are all based on a finite space of possible
> suggestions, i.e. the ones they were built on, so they can only
> suggest a full suggestion from that space.
> This means if the current query goes outside of that space then no
> suggestions will be found.
> The goal of FreeTextSuggester is to address this, by giving
> predictions based on an ngram language model, i.e. using the last few
> tokens from the user's query to predict likely following token.
> I got the idea from this blog post about Google's suggest:
> http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html
> This is very much still a work in progress, but it seems to be
> working.  I've tested it on the AOL query logs, using an interactive
> tool from luceneutil to show the suggestions, and it seems to work well.
> It's fun to use that tool to explore the word associations...
> I don't think this suggester would be used standalone; rather, I think
> it'd be a fallback for times when the primary suggester fails to find
> anything.  You can see this behavior on google.com, if you type "the
> fast and the ", you see entire queries being suggested, but then if
> the next word you type is "burning" then suddenly you see the
> suggestions are only based on the last word, not the entire query.
> It uses ShingleFilter under-the-hood to generate the token ngrams;
> once LUCENE-5180 is in it will be able to properly handle a user query
> that ends with stop-words (e.g. "wizard of "), and then stores the
> ngrams in an FST.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5214) Add new FreeTextSuggester, to handle "long tail" suggestions

Reply via email to