Hi,
Has anyone tried to implement circumfix suggesters, where the suggestion
is a circumfix of the lookup string?
E.g. "sox rumor" suggests "boston red sox rumors" (try it on
google.com).
I think there are several of ways to implement this:
* Given some multiword term, add all word subsequences to the
suggester individually ("boston red sox rumors" adds also "red sox
rumors", "sox rumors", "rumors") - that can be achieved using a special
TermFreqIterator. This turns the lookup problem into a standard prefix
search. While this works, it effectively modifies the surface form, and
the "full term" needs to be indexed and looked up elsewhere.
* Constructing a token graph with appropriate substring arcs
from the (hopefully linear) token sequence, using a special TokenFilter.
The benefit is that the surface form is always the same, but the
automaton may become large (at least if you are using an
AnalyzingSuggester).
* DIY, using suffix arrays or something similar.
But I'm sure there are other ways and/or tradeoffs I haven't thought
about J I'd be interested in your feedback.
Cheers, Oli