Netflix also does this, eg type transla (you need an account). I think it'd be good to somehow support this (Lucene's suggesters don't today).
The first two approaches should conceptually work, but both will bloat the FST (I'd be curious to know how much!). Maybe another approach would be ... to index only single tokens into the suggester? And then, from the user's query, run the suggester on each token separately, and then do a second search (against a "normal" Lucene index) to find all documents containing those tokens? Eg, you'd index only "boston", "red", "sox", "rumor" into the FST, and then have a separate search index with "boston red sox rumor" indexed as a document. If the user types "red so", then you run suggest on "red" and on "so", and then run a hmm MultiPhraseQuery for (red|redmond|reddit) (so|sox|sophomore|...) against the index? How to score/sort the resulting hits will be interesting ... if you have strong priors / boost (e.g. you have a good source of "popularity" or something) then you could sort by that ... Mike McCandless http://blog.mikemccandless.com On Wed, Jan 16, 2013 at 4:27 PM, Oliver Christ <ochr...@ebscohost.com> wrote: > Hi, > > > > Has anyone tried to implement circumfix suggesters, where the suggestion > is a circumfix of the lookup string? > > > > E.g. "sox rumor" suggests "boston red sox rumors" (try it on > google.com). > > > > I think there are several of ways to implement this: > > > > * Given some multiword term, add all word subsequences to the > suggester individually ("boston red sox rumors" adds also "red sox > rumors", "sox rumors", "rumors") - that can be achieved using a special > TermFreqIterator. This turns the lookup problem into a standard prefix > search. While this works, it effectively modifies the surface form, and > the "full term" needs to be indexed and looked up elsewhere. > > * Constructing a token graph with appropriate substring arcs > from the (hopefully linear) token sequence, using a special TokenFilter. > The benefit is that the surface form is always the same, but the > automaton may become large (at least if you are using an > AnalyzingSuggester). > > * DIY, using suffix arrays or something similar. > > > > But I'm sure there are other ways and/or tradeoffs I haven't thought > about J I'd be interested in your feedback. > > > > Cheers, Oli > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org