On Mon, Mar 11, 2013 at 6:31 AM, Nils Knappmeier <n.knappme...@i-views.de> wrote: > Dear all, > > I have a request to implement an auto-suggest feature for our lucene based > product. > We have upgraded to Lucene 4.1 and intend to use the AnalyzingSuggester, but > we cannot determine the correct way of using it for our request. > > We have problems with two aspects: > > 1) The suggester should suggest original (stored) field values. The API is > be built such that a LuceneDictionary is used to provide terms to the > suggester. A Dictionary provides a BytesRefIterator, which is (i.e. in > LuceneDictionary) implemented to return the tokenized and analyzed terms > with reduced umlauts and plural forms). > How is the intended use here?
You shouldn't use LuceneDictionary, since it just enumerates the tokens from the index. Instead, make your own TermFreqIterator that provides the original suggestion, and pass an Analyzer to AnalyzingSuggester to normalize the surface forms. > 2) We do want to suggest terms that have an empty search result. There are a I think you meant "do not"? > number of filters that can be set (zip-code, categories). Our problem is > that there is no way to tell the suggester about these filters. Do we have > to iterate all suggested terms and check for each one, if it provides > results with the given filter settings? This is tricky. You could build a separate suggester per category/zip code (or, possibly prefix-code each suggestion with the category/zip code into one suggester), but likely this will blow up (ie, if the same suggestion often appears across zip codes / categories). If your suggestions are already highly orthogonal across category / zip code then it may not blow up... Alternatively maybe you could store some info per-suggestion about which zip code / category it appears in, using upcoming payloads addition (see LUCENE-4820), and use that to filter each suggestion as it arrives. But: have you confirmed this is really a problem in practice? Ie, typically suggestions have a strong a-priori rank based on eg how often that query was asked (if suggestions come from your query logs, like Google) or based on how popular that item is (if your suggestions come from your content, like Netflix), in which case, if suggestions are not that orthogonal, the risk of a bad suggestion may be very low? Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org