We found the standard tf-idf scoring to be satisfactory for our purposes as if 
the user continues typing the dropdown list changes automatically as the more 
characters in the query the fewer documents that match. 

We have been testing this with a list of 100M unique names (first_last) and 
have very good performance and results.

Michael



-----Original Message-----
From: Heath Aldrich [mailto:[email protected]] 
Sent: Thursday, January 07, 2010 11:13 AM
To: [email protected]
Subject: RE: Suggest Search Terms

Thanks Michael, 

In this scenario, how did you decide the "best" terms to suggest?

Depending on the search term, it could give back many thousands of results, but 
I don't know how one could rank the terms to decide these are the top 10 to 
present to the user.



-----Original Message-----
From: Michael Garski [mailto:[email protected]] 
Sent: Thursday, January 07, 2010 11:55 AM
To: [email protected]
Subject: RE: Suggest Search Terms

Prefix query does not perform well when you have a lot of terms that match.  
We've implanted search box term suggestion using a custom analyzer and standard 
queries.  Here's a quick overview of what we did to cover matching from the 
beginning of the text.  

First you'll need a list of the terms/phrases you want available in the 
suggested terms.  You can do this by walking over the terms in your index, or 
from the original data source used to build the index.

Now you'll need a custom analyzer to create the tokens that will be in the 
index.  We took a different approach than n-grams that tokenizes text like this:

Original Term: "Michael"

Tokens: "m", "mi", "mic", "mich", "micha", "michae", "michael" 

You can tokenize to include spaces as well...

Original Term: "Michael Garski"

Tokens: "m", "mi", "mic", "mich", "micha", "michae", "michael", "michael ", 
"michael g", etc...

Tokenizing the text in this way allows you to use a standard TermQuery as 
opposed to a PrefixQuery and since the search is on only a single term, it is 
quite fast.

The type ahead triggers a search via AJAX as the user types in the search box.  
The queries are not submitted until three characters are typed to limit the 
number of documents that are matched, which improves query performance.  As the 
contents of the suggest index do not change frequently we cache results to 
further improve performance.

Michael



-----Original Message-----
From: Glyn Darkin [mailto:[email protected]] 
Sent: Thursday, January 07, 2010 9:36 AM
To: [email protected]
Subject: Re: Suggest Search Terms

Hi Heath,

I have implement this using a prefix query against a particular field.

Good luck.

Glyn


2010/1/7 Heath Aldrich <[email protected]>:
> Hello all...
>
>
>
> I'm looking for some guidance on how to get suggested search terms going from 
> the lucene.net perspective.
>
> I have seen a few concepts using SOLR, but I'm trying to figure out how to 
> make it happen using lucene.
>
>
>
> I would like to be able to suggest the rest of a search term, much as Google 
> does when searching.  I can figure the AJAX part of displaying the results no 
> problem, but I really don't know how to make lucene provide the results that 
> I should be displaying.
>
>
>
> I "think" it is done using n-grams, but that's really about as far as I have 
> found thus far.
>
>
>
> Any guidance is appreciated...
>
>
>
> Thanks.
>
> Heath Aldrich
>
>



-- 
Glyn Darkin

Darkin Systems Ltd
Mob: 07961815649
Fax: 08717145065
Web: www.darkinsystems.com

Company No: 6173001
VAT No: 906350835






Reply via email to