On Mon, Mar 11, 2013 at 7:33 AM, Nils Knappmeier <n.knappme...@i-views.de> wrote: > Hi, > >> This is tricky. >> >> You could build a separate suggester per category/zip code (or, >> possibly prefix-code each suggestion with the category/zip code into >> one suggester), but likely this will blow up (ie, if the same >> suggestion often appears across zip codes / categories). If your >> suggestions are already highly orthogonal across category / zip code >> then it may not blow up... >> >> Alternatively maybe you could store some info per-suggestion about >> which zip code / category it appears in, using upcoming payloads >> addition (see LUCENE-4820), and use that to filter each suggestion as >> it arrives. >> >> But: have you confirmed this is really a problem in practice? Ie, >> typically suggestions have a strong a-priori rank based on eg how >> often that query was asked (if suggestions come from your query logs, >> like Google) or based on how popular that item is (if your suggestions >> come from your content, like Netflix), in which case, if suggestions >> are not that orthogonal, the risk of a bad suggestion may be very low? > > Maybe we had a misconception of the intended use case of the > AnalyzingSuggester or the auto-suggest feature in general. > > Our suggestions should come solely from the index and not from a query log. > I haven't even thought about using a query log as source. I think, in this > case, it would be better to work on the index directly (using a > PrefixTermEnum or so)...
It's fine for the source of the suggestions to be the index, but then those input strings are necessarily whatever you had previously indexed/analyzed/tokenized. Ie, if you normalize accents and stem your tokens, then the input to the suggester will be the normalized form not the surface form, and it will suggest only those normalized forms. Whereas the power of the AnalyzingSuggester is to take the surface forms (unanalyzed) as input, yet make suggestions based on the analyzed form. So the user will see suggestions with accents and with plurals. Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org