True, i was thinking of doing something like that. A mixture of n-grams, ontology and dictionary/spell checking would be best, that way you could better find related queries and possibly some context determination as well.
using the example from lucene sources a replacement for "web hsoting" (a very common misspellnig" can come out as "we shooting" and possibly an ontology description/mapping could offer insight of subject mapping since it would be smart enough to look at an ontology description of web and relationship to host/hosting. infact i wonder if the ontology module would be a good place to try and start from using near matches or some logic like that? -----Original Message----- From: Sami Siren <[EMAIL PROTECTED]> To: nutch-user@incubator.apache.org Date: Wed, 20 Apr 2005 16:16:37 +0300 Subject: Re: "did you mean" feature > other solution that I have seen on use is to record a log of actual > queries users have been doing and construct suggestions based on those. > > -- > Sami Siren > > > Byron Miller wrote: > > Doug, > > > > Thanks for the quick response! I'll take a look at the code and see > if i > > can't come up with something to work. > > > > At a quick glance, is this using an existing index to build the > ngrams > > from or is this an index from a dictionary source? > > > > thanks, > > -byron > > > > -----Original Message----- > > From: Doug Cutting <[EMAIL PROTECTED]> > > To: nutch-user@incubator.apache.org > > Date: Tue, 19 Apr 2005 12:42:25 -0700 > > Subject: Re: "did you mean" feature > > > > > >>Byron Miller wrote: > >> > >>>I haven't seen anything in the list, but is there any code > available? > >> > >>I > >> > >>>jumped over to the lucene site but ofcourse the lists aren't > >> > >>searchable > >> > >>>right now (get an error) > >> > >>David Spencer has worked on this some. > >> > >>http://www.searchmorph.com/weblog/index.php?id=23 > >> > >>I think the code on his site might be more recent than what's > committed > >>to the lucene/contrib directory. > >> > >>Doug > >> > > > > > > >