Warning: I don't understand ngrams at all, so you should read this as a plea for those who do to tell me I'm off base <G>.
But I wonder if indexing as n-grams would be a way to cope with this issue that lots of people have. *assuming* you are thinking about single terms, then it seems that "smith" would be tokenized as sm, mi, it, th. Then a wildcard search for "mi it" would hit (as a phrase query or a SpanQuery with slop of 0). It seems like there are several issues to work out here, especially including multiple terns, matching mixtures of wildcards and non-wildcards, etc. But it seems do-able.... Another approach is to use WildcardTernEnum and/or RegexTermEnum to build up a filter and use the filter as part of the query. What you loose with this approach is that the filter (and wildcards) then don't contribute to scoring. But this isn't a huge price to pay... Best Erick On Wed, Jun 25, 2008 at 1:47 PM, Mark Ferguson <[EMAIL PROTECTED]> wrote: > Hello, > > I am currently keeping an index of all our client's usernames. The search > functionality is implemented using a PrefixFilter. However, we would like > to > expand the functionality to be able to search any part of a user's name, > rather than requiring that it begin with the query string. So for example, > the search term 'mit' would return the username 'smith'. > > I am hesitant to use a WildcardQuery starting with an asterisk because I've > read about why this is a bad idea. I am looking for suggestions on the best > way to implement this. > > The idea I've come up with is to index each part of the username; so for > example, if the username is 'mark', you would index mark, ark, rk, and k. > Then you could still use the PrefixFilter. I'm not overly concerned about > how this would enlarge the index because usernames tend to be fairly short. > > I am very much open to other suggestions however. Does anyone have any > opinions or ideas that they can share? > > Thanks very much. > > Mark >