Ideas/comments below: On Tue, Oct 30, 2012 at 9:40 AM, Oliver Christ <ochr...@ebscohost.com> wrote:
> I'm currently researching using a WFST suggester on e.g. book titles. > While our basic use cases are well covered, there seem to be at least > three which aren't: > > * The possibility to associate a "foreign key" with a string > (rather: final node) in the WFST (in addition to the rank). For example, > I'd like to add "Lucene in Action" with key 1933988177 (the ISBN) and > some rank to the WFST. A completion would return the completed string > and the key associated with each entry (i.e. final nodes get a "key" > field (int), which is returned in the LookupResult). That foreign key > could also be used for fast de-duping (no more string/byte array > comparisons). This is maybe the same idea as https://issues.apache.org/jira/browse/LUCENE-4491 ? Could you simply stuff your ISBN onto the end of the suggestion (ie enroll Lucene in Action|1933988177)? > * When looking up completions, I'd like to be able to specify a > filter which further determines whether some completion should be > considered or not. Assume, for example, that I'm only interested in > computer science books, but can't maintain separate WFSTs for each > subject area. Given some completion candidate (represented by its key), > the filter would be called (with the key as a parameter) to determine > whether or not the completion candidate should be added to the result > queue. The new AnalyzingSuggester (coming in 4.1) actually has something like this, internally, in its search. It does this to remove duplicate surface forms (so it doesn't suggest the same thing more than once). Maybe we could expose this so that eg you could subclass WFSTSuggester w/ your own filtering function and it will keep searching until it finds topN that your filter accepts? Or alternatively you could pull a big topN and then filter yourself later, but that's less efficient... > * Highlighting of the completed portions (i.e. explicit markup > of user-provided vs. auto-completed portions of a completion). Hmm could you do this in the app level? Ie, hilite the common prefix yourself? > What's your take on the above? What would be the best way to achieve > this? We want to use AnalyzingSuggester, so the above applies > particularly to them. I think this is great feedback ... suggesters have been getting a lot of attention lately. Maybe open an issue for the custom filtering of candidates? > My current research indicates the following: > > > > * There may be workarounds for the "foreign key" use case -it > seems that lots of data structures would be affected by storing a > user-provided key with final nodes, which therefore may not be a viable > path. It may be possible to encode the foreign key in the transducer's > output instead. Maybe we could add a "payload" (stored in the FST's output) for each suggestion... > * Adding a filter/predicate to the AnalyzingSuggester is simple, > as TopNSearcher<> already uses acceptResult() to test whether some > completion should be added - that can be overridden in a derived > searcher class which simply calls the predicate. Ideally the suggesters > would access some kind of factory to instantiate the searcher to be used > (instead of hardwiring it in). Exactly! One gotchya is you have to be careful about the maxQueueDepth, because if your acceptResult accepts too few results then the queue may have pruned away paths that would have led to a valid topN path ... We may also invert all of these FST based suggests, and expose building blocks for apps to build up custom suggesters. There are many use cases we need to accommodate and we have a ways to converge on a clear API here ... > * I haven't found a simple solution for the highlighting yet, > particularly when using AnalyzingSuggester (where it's non-trivial). Ahh I see ... it is challenging in that case. Hmm. Maybe open an issue for this as well, so we can discuss/iterate? Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org