Which documentation are you reading? The analyzer you send to FreeTextSuggester should not make shingles itself: the suggester does this internally, based on the grams parameter.
Maybe look at the TestFreeTextSuggester unit test as an example? Mike McCandless http://blog.mikemccandless.com On Sat, Jun 27, 2015 at 6:52 PM, Alessandro Benedetti <[email protected]> wrote: > Hi guys, > after reading the documentation for the FreetextSuggester I have some doubts > : > > Actually the documentation is not clear enough. > Let's try to understand this suggester. > > Building > This suggester build a FST that it will use to provide the autocomplete > feature running prefix searches on it . > The terms it uses to generate the FST are the tokens produced by the > "suggestFreeTextAnalyzerFieldType" . > > And this should be correct. > So if we have a shingle token filter[1-3] ( we produce unigrams as well) in > our analysis to keep it simple , from these original field values : > "mp3 ipod" > "mp3 player" > "mp3 player ipod" > "player of Real" > > -> we produce these list of possible suggestions in our FST : > > <mp3> > <player> > <ipod> > <real> > <of> > > <mp3 ipod> > <mp3 player> > <player ipod> > <player of> > <of real> > > <mp3 player ipod> > <player of real> > > From the documentation I read : >> >> " ngrams: The max number of tokens out of which singles will be make the >> dictionary. The default value is 2. Increasing this would mean you want more >> than the previous 2 tokens to be taken into consideration when making the >> suggestions. " > > > This makes me confused, as I was not expecting this param to affect the > suggestion dictionary. > So I would like a clarification here from our masters :) > At this point let's see what happens at query time . > > Query Time > As my understanding the ngrams params will consider the last N-1 tokens the > user put separated by the space separator. > >> "Builds an ngram model from the text sent to {@link >> * #build} and predicts based on the last grams-1 tokens in >> * the request sent to {@link #lookup}. This tries to >> * handle the "long tail" of suggestions for when the >> * incoming query is a never before seen query string." > > > Example , grams=3 should consider only the last 2 tokens > > special mp3 p -> mp3 p > > Then this query is analysed using the "suggestFreeTextAnalyzerFieldType" . > We produce 3 tokens : > <mp3> > <p> > <mp3 p> > > And we run the prefix matching on the FST . > > Conclusion > My understanding is wrong for sure at some point, as the behaviour I get is > different. > Can we discuss this , clarify this and eventually put it in the official > documentation ? > > Cheers > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
